<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Index of /dev/null</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/" />
    <link rel="self" type="application/atom+xml" href="http://www.autobugfix.com/atom.xml" />
    <id>tag:www.autobugfix.com,2013-04-04://3</id>
    <updated>2013-05-01T21:13:17Z</updated>
    <subtitle>random experiments</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 5.2.3</generator>

<entry>
    <title>Gnu Libidn EsZett Hotfix</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/11/gnu-libidn-eszett-hotfix.html" />
    <id>tag:www.autobugfix.com,2010://3.54</id>

    <published>2010-11-01T16:11:35Z</published>
    <updated>2013-05-01T21:13:17Z</updated>

    <summary><![CDATA[ Dies ist kein Update der Libidn auf IDNA2008. Ziel ist es, mit einfachen Mitteln das IDNA2003-Mapping von Codepoints der Kategorie PVALID (RFC 5892), insbesondere also des &quot;&#xDF;&quot; (U+00DF; LATIN SMALL LETTER SHARP S), bei Bedarf unterdrücken zu können. Ausgangspunkt...]]></summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term=".NET" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="algorithms" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="idna" label="IDNA" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="libidn" label="Libidn" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="programming" label="programming" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
<strong>Dies ist kein Update der Libidn auf IDNA2008.</strong> Ziel ist es, mit einfachen Mitteln das IDNA2003-Mapping von Codepoints der Kategorie <code>PVALID</code> (RFC 5892), insbesondere also des &quot;&#xDF;&quot; (<code>U+00DF; LATIN SMALL LETTER SHARP S</code>), bei Bedarf unterdrücken zu können. Ausgangspunkt ist die Erfordernis, kurzfristig <a href="http://www.autobugfix.com/2010/10/lokalisiert-das-kleine-eszett-im-world-wide-web.html">Domainnamen mit &quot;&#xDF;&quot;</a> innerhalb der DE-Zone verabeiten zu können.
</p>

]]>
        <![CDATA[<p>
Die Änderungen werden hier nur für die C#-Version in <a href="http://ftp.gnu.org/gnu/libidn/" target="_blank">libidn-1.9</a> erwähnt und sind ggf. auf andere Sprachen oder Versionen zu übertragen.
</p>

<p>
Ziel ist es, die Funktionen <code>ToAscii</code> und <code>ToUnicode</code> um einen Parameter <code>bool useIDNA2008</code> zu erweitern, der die Wirkung hat, dass <code>PVALID</code> Codepoints vom Mapping ausgenommen werden.
</p>

<p>
(1) Hinzufügen einer Klasse IDNA2008, die alle <code>PVALID</code> Codepoints enthält:
</p>
<pre>
  // IDNA2008.cs
  namespace Gnu.Inet.Encoding {
    class IDNA2008
    {
      // rfc5892 PVALID codepoints
      public static char[] PVALID = new char[] {
        '\u00DF', // LATIN SMALL LETTER SHARP S
        '\u03C2', // GREEK SMALL LETTER FINAL SIGMA
        '\u06FD', // ARABIC SIGN SINDHI AMPERSAND
        '\u06FE', // ARABIC SIGN SINDHI POSTPOSITION ME
        '\u0F0B', // TIBETAN MARK INTERSYLLABIC TSHEG
        '\u3007'  // IDEOGRAPHIC NUMBER ZERO
      };
    }
  }
</pre>

<p>
(2) Überladen der Funktion <code>Map</code> (Stringprep.cs) mit einem dritten Parameter, der zu ignorierende Codepoints bezeichnet:
</p>

<pre>
  internal static void Map(StringBuilder s, char[] search, string[] replace, <span style="color:red">char[] ignore</span>)
  {
    for (int i = 0; i < search.Length; i++)
    {
      char c = search[i];

      <span style="color:red">// check if c should be ignored
      bool ign = false;
      for (int t = 0; t < ignore.Length; t++)
      {
        if (ignore[t] == c)
        {
          ign = true;
          break;
        }
      }
      if (ign)
        continue;</span>

      int j = 0;
      while (j < s.Length)
      {
        if (c == s[j])
        {
          //s.deleteCharAt(j);
          s.Remove(j, 1);
          if (null != replace[i])
          {
            s.Insert(j, replace[i]);
            j += replace[i].Length - 1;
          }
        }
        else
        {
          j++;
        }
      }
    }
  }
</pre>

<p>
(3) Überladen der Funktion Nameprep (Strinprep.cs) mit einem dritten Parameter <code>bool useIDNA2008</code>:
</p>

<pre>
  public static string NamePrep(string input, bool allowUnassigned, <span style="color:red">bool useIDNA2008</span>)
  {
    if (input == null)
    {
      throw new System.NullReferenceException();
    }

    StringBuilder s = new StringBuilder(input);

    if (!allowUnassigned && Contains(s, RFC3454.A1))
    {
      throw new StringprepException(StringprepException.CONTAINS_UNASSIGNED);
    }

    Filter(s, RFC3454.B1);

    // EsZett Hotfix
    if (useIDNA2008)
      <span style="color:red">Map(s, RFC3454.B2search, RFC3454.B2replace, IDNA2008.PVALID);</span>
    else
      Map(s, RFC3454.B2search, RFC3454.B2replace);


    s = new StringBuilder(NFKC.NormalizeNFKC(s.ToString()));
    // B.3 is only needed if NFKC is not used, right?
    // map(s, RFC3454.B3search, RFC3454.B3replace);

    if (Contains(s, RFC3454.C12) || Contains(s, RFC3454.C22) || Contains(s, RFC3454.C3) || Contains(s, RFC3454.C4) || Contains(s, RFC3454.C5) || Contains(s, RFC3454.C6) || Contains(s, RFC3454.C7) || Contains(s, RFC3454.C8))
    {
      // Table C.9 only contains code points > 0xFFFF which Java
      // doesn't handle
      throw new StringprepException(StringprepException.CONTAINS_PROHIBITED);
    }

    // Bidi handling
    bool r = Contains(s, RFC3454.D1);
    bool l = Contains(s, RFC3454.D2);

    // RFC 3454, section 6, requirement 1: already handled above (table C.8)

    // RFC 3454, section 6, requirement 2
    if (r && l)
    {
      throw new StringprepException(StringprepException.BIDI_BOTHRAL);
    }

    // RFC 3454, section 6, requirement 3
    if (r)
    {
      if (!Contains(s[0], RFC3454.D1) || !Contains(s[s.Length - 1], RFC3454.D1))
      {
        throw new StringprepException(StringprepException.BIDI_LTRAL);
      }
    }

    return s.ToString();
  }
</pre>

<p>
(4) Überladen der Funktionen <code>ToAscii</code> und <code>ToUnicode</code> (IDNA.cs), um den Parameter
<code>bool useIDNA2008</code> an <code>Nameprep</code> durchzureichen:
</p>

<pre>
  public static string ToASCII(string input, bool allowUnassigned, bool useSTD3ASCIIRules, <span style="color:red">bool useIDNA2008</span>)
  {
    // Step 1: Check if the string contains code points outside
    //         the ASCII range 0..0x7c.

    bool nonASCII = false;

    for (int i = 0; i < input.Length; i++)
    {
      int c = input[i];
      if (c > 0x7f)
      {
        nonASCII = true;
        break;
      }
    }

    // Step 2: Perform the nameprep operation.

    if (nonASCII)
    {
      try
      {
        input = Stringprep.NamePrep(input, allowUnassigned, <span style="color:red">useIDNA2008</span>);
      }
      catch (StringprepException e)
      {
        // TODO
        throw new IDNAException(e);
      }
    }

    // Step 3: - Verify the absence of non-LDH ASCII code points
    //    (char) 0..0x2c, 0x2e..0x2f, 0x3a..0x40, 0x5b..0x60,
    //    (char) 0x7b..0x7f
    //         - Verify the absence of leading and trailing
    //           hyphen-minus

    if (useSTD3ASCIIRules)
    {
      for (int i = 0; i < input.Length; i++)
      {
        int c = input[i];
        if ((c <= 0x2c) || (c >= 0x2e && c <= 0x2f) || (c >= 0x3a && c <= 0x40) || (c >= 0x5b && c <= 0x60) || (c >= 0x7b && c <= 0x7f))
        {
          throw new IDNAException(IDNAException.CONTAINS_NON_LDH);
        }
      }

      if (input.StartsWith("-") || input.EndsWith("-"))
      {
        throw new IDNAException(IDNAException.CONTAINS_HYPHEN);
      }
    }

    // Step 4: If all code points are inside 0..0x7f, skip to step 8

    nonASCII = false;

    for (int i = 0; i < input.Length; i++)
    {
      int c = input[i];
      if (c > 0x7f)
      {
        nonASCII = true;
        break;
      }
    }

    string output = input;

    if (nonASCII)
    {

      // Step 5: Verify that the sequence does not begin with the ACE prefix.

      if (input.StartsWith(ACE_PREFIX))
      {
        throw new IDNAException(IDNAException.CONTAINS_ACE_PREFIX);
      }

      // Step 6: Punycode

      try
      {
        output = Punycode.Encode(input);
      }
      catch (PunycodeException e)
      {
        // TODO
        throw new IDNAException(e);
      }

      // Step 7: Prepend the ACE prefix.

      output = ACE_PREFIX + output;
    }

    // Step 8: Check that the length is inside 1..63.

    if (output.Length < 1 || output.Length > 63)
    {
      throw new IDNAException(IDNAException.TOO_LONG);
    }

    return output;
  }


  public static string ToUnicode(string input, bool allowUnassigned, bool useSTD3ASCIIRules, <span style="color:red">bool useIDNA2008</span>)
  {
    string original = input;
    bool nonASCII = false;

    // Step 1: If all code points are inside 0..0x7f, skip to step 3.

    for (int i = 0; i < input.Length; i++)
    {
      int c = input[i];
      if (c > 0x7f)
      {
        nonASCII = true;
        break;
      }
    }

    // Step 2: Perform the Nameprep operation.

    if (nonASCII)
    {
      try
      {
        input = Stringprep.NamePrep(input, allowUnassigned, <span style="color:red">useIDNA2008</span>);
      }
      catch (StringprepException e)
      {
        // ToUnicode never fails!
        return original;
      }
    }

    // Step 3: Verify the sequence starts with the ACE prefix.

    if (!input.StartsWith(ACE_PREFIX))
    {
      // ToUnicode never fails!
      return original;
    }

    string stored = input;

    // Step 4: Remove the ACE prefix.

    input = input.Substring(ACE_PREFIX.Length);

    // Step 5: Decode using punycode

    string output;

    try
    {
      output = Punycode.Decode(input);
    }
    catch (PunycodeException e)
    {
      // ToUnicode never fails!
      return original;
    }

    // Step 6: Apply toASCII

    string ascii;

    try
    {
      ascii = ToASCII(output, allowUnassigned, useSTD3ASCIIRules, <span style="color:red">useIDNA2008</span>);
    }
    catch (IDNAException e)
    {
      // ToUnicode never fails!
      return original;
    }

    // Step 7: Compare case-insensitively.

    if (!ascii.ToUpper().Equals(stored.ToUpper()))
    {
      // ToUnicode never fails!
      return original;
    }

    // Step 8: Return the result.

    return output;
  }
</pre>

<p>
(5) Testen von <code>ToAscii</code> und <code>ToUnicode</code> mit und ohne Anwendung von IDNA2008:
</p>

<pre>
  [TestMethod()]
  public void Test030_EsZett_IDNA2003()
  {

    string u1 = "täßt";

    // Nameprep IDNA2203 should send "täßt" to "tässt"
    string u2 = Stringprep.NamePrep(u1, false, false);
    Assert.AreEqual("tässt", u2);

    // ToAscii IDNA2003 should send both "täßt" and "tässt" to "xn--tsst-loa"
    string a1 = IDNA.ToASCII(u1, false, true, false);
    Assert.AreEqual("xn--tsst-loa", a1);

    string a2 = IDNA.ToASCII(u2, false, true, false);
    Assert.AreEqual(a1, a2);

    // ToUnicode IDNA2003 should send "xn--tsst-loa" to "tässt"
    string u3 = IDNA.ToUnicode(a1, false, true, false);
    Assert.AreEqual(u2, u3);

  }

  [TestMethod()]
  public void Test040_EsZett_IDNA2008()
  {

    string u1 = "täßt";

    // Nameprep IDNA2208 should send "täßt" to "täßt"
    string u2 = Stringprep.NamePrep(u1, false, true);
    Assert.AreEqual(u1, u2);

    // ToAscii IDNA2008 should send "täßt" to "xn--tt-giat"
    string a1 = IDNA.ToASCII(u1, false, true, true);
    Assert.AreEqual("xn--tt-giat", a1);

    string a2 = IDNA.ToASCII(u2, false, true, true);
    Assert.AreEqual(a1, a2);

    // ToUnicode IDNA2003 should send "xn--tt-giat.de" to "täßt"
    string u3 = IDNA.ToUnicode(a1, false, true, true);
    Assert.AreEqual(u2, u3);

  }
</pre>
]]>
    </content>
</entry>

<entry>
    <title>Lokalisiert: Das kleine EsZett im World Wide Web</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/10/lokalisiert-das-kleine-eszett-im-world-wide-web.html" />
    <id>tag:www.autobugfix.com,2010://3.17</id>

    <published>2010-10-31T14:36:51Z</published>
    <updated>2013-04-27T23:20:46Z</updated>

    <summary>Wie immer, wenn etwas immer größer und komplizierter wird, zeichnet sich ein Trend zur Lokalisierung ab. Das Internet ist ein topologischer Raum, der so hochdimensional geworden ist, dass man ihn nur noch als Überdeckung eines unfassbaren Etwas durch lokale Landkarten erklären kann. Das Kraftwerk der Globalisierung sehnt sich heute nach Semantik, sucht soziale Kontakte und organische Strukturen.Es möchte den Menschen nahe sein, ihre Gegend kennen und ihren Dialekt sprechen.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="META" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="domains" label="domains" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="idna" label="IDNA" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
Wie immer, wenn etwas immer grö&szlig;er und komplizierter wird, zeichnet sich ein Trend zur Lokalisierung ab. Das Internet ist ein topologischer Raum, der so hochdimensional geworden ist, dass man ihn nur noch als Überdeckung eines unfassbaren Etwas durch lokale Landkarten erklären kann. Das Kraftwerk der Globalisierung sehnt sich heute nach Semantik, sucht soziale Kontakte und organische Strukturen.Es möchte den Menschen nahe sein, ihre Gegend kennen und ihren Dialekt sprechen.
</p>
]]>
        <![CDATA[<p>
Die alten globalen und normativen Lösungen werden verfeinert und den Bedürfnissen angepasst. Zum Beispiel der IDNA-Standard. Die Intention von IDNA ist es, Abbildungen zu definieren, die eine Kommunikation zwischen den unterschiedlichen und eigenartigen Informationsspektren beliebiger lokaler Entitäten auf der Basis uralter und schwer veränderbarer Protokolle wie dem DNS-System ermöglichen.
</p>

<p>
Im August 2010 veröffentlichte die IETF die IDNA2008-Spezifizierungen in RFC <a target="_blank" href="http://tools.ietf.org/html/rfc5890">5890</a> - <a target="_blank" href="http://tools.ietf.org/html/rfc5894">5894</a>. Eine Übersicht über die Unterschiede zwischen IDNA2003 und IDNA2008 bietet der <a target="_blank" href="http://unicode.org/reports/tr46/">Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)</a>. Bis dahin war IDNA eine globale Funktion, im Wesentlichen bestehend aus dem Nameprep-Mapping und dem Punycode-Algorithmus, wobei Nameprep effektiv durch eine Reihe von Tabellen definiert ist, die z.B. Gro&szlig;- auf Kleinbuchstaben und &quot;&szlig;&quot; auf &#8220;ss&#8221; abbilden. Der alte Standard hatte damit die Verantwortlichkeit für die konsistente Behandlung unterschiedlicher Sprachen innhalb des IDNA-Protokolls angesiedelt. Diese Lösung war global, einfach und unflexibel.
</p>

<p>
Mit IDNA2008 wurde eine neue Terminolgie geschaffen, die den Anforderungen an die lokale Unterschiedlichkeit von Applikationen und Benutzern gerecht werden und die Offenheit gegenüber zukünftigen Unicode-Versionen garantieren soll. Das Eszett, namentlich der Codepoint <code>U+00DF (LATIN SMALL LETTER SHARP S)</code>, wurde auf Betreiben von DENIC als Ausnahme in die Kategorie <code>PVALID</code> (Protocol Valid) aufgenommen. Während also die deutsche Ligatur <em>&szlig;</em> in der DE-Zone nun mit <code>xn--zca</code> aufgelöst wird, kann es gleichzeitig in einer anderen Zone mit <code>ss</code> aufgelöst werden. Ein Beitrag zur Lokalisierung und zur Konjunktur bei ISPs und Anwälten.
</p>

<p>
Dieser Codepoint bildet nun eine <a target="_blank" href="http://unicode.org/reports/tr46/#Deviations">Deviation</a>, d.h. unterschiedliche Applikationen können ihn abweichend verarbeiten. Man stelle sich vor, Alice greift von zu Hause aus auf ihr Konto unter <em>http://www.sparkasse-gießen.de</em> zu. Ihr Browser unterstützt IDNA2003, bildet also auf <em>http://www.sparkasse-giessen.de</em> ab und löst auf die IP-Adresse des Sparkassenservers auf. Nun besucht sie ihren Freund Bob und prüft dort ihren Kontostand. Bobs Browser unterstützt IDNA2008, benutzt also bei gleicher Eingabe stattdessen <em>http://www.xn&#8212;sparkasse-gieen-2ib.de</em>, was auf eine ganz andere IP-Adresse aufgelöst werden kann. Unter dieser könnte der Phishing-Server von Eve antworten, die so die Zugangsdaten von Alice ausspionieren kann.
</p>

<p>
Und wenn schlie&szlig;lich der Browser am Encoding der Diskussionsbeiträge verzweifelt, so wirkt das irgendwie selbstreferenziell:
</p>

<p>
<img src="http://www.autobugfix.com/2010/10/31/eszett1.jpg" alt="discussions on ietf.idnabis" />

Ãh??? :-)
</p>

<p>
Für den ISP-Developer ist Lokalisierung allerdings stets eine Herausforderung. Er ist dafür zuständig, dass die lokalen Einheiten untereinander und miteinander kommunizieren können. Und wer will sich heute schon noch in ASCII unterhalten.
</p>

<p>
Das kleine &quot;&szlig;&quot; landet also eines Tages auf dem Schreibtisch und erklärt sich für gültig. Natürlich wei&szlig; die an Hunderten von Stellen tief in alle Systeme eingegrabene IDNA-Software noch lange nichts von den neuen RFCs. Der .NET <code>System.Globalization</code>-Namespace, der bis dahin zuverlässige Dienste beim Normalisieren, Validieren und Konvertieren auch der absonderlichsten Zeichen leistete, ist unter keinen Umständen dazu zu bewegen, ein &quot;&szlig;&quot; in Domain-Namen zu akzeptieren. Und was so eine DLL nicht kann, das kann sie eben nicht - tja, auf hoher See, vor Gericht und vor Microsoft  &#8230;
</p>

<p>
Am 26. Oktober 2010 kündigte DENIC an, praktisch ab sofort IDNA2008 zu unterstützen und zunächst in einer Sunrise-Phase allen Inhabern von Domains, die ein &#8220;ss&#8221; enthalten, die Gelegenheit zu geben, das Pendant mit &quot;&szlig;&quot; zu registrieren. Es musste innherhalb von wenigen Stunden eine Ad-hoc-Lösung her. Und das einzig Naheliegende war, eine automatische Abfrage des <a target="_blank" href="http://www.denic.de/domains/internationalized-domain-names/idn-konvertierung.html">DENIC-Web-Tools</a> zu programmieren, das in dem Moment den einzigen bekannten &szlig;-fähigen Konvertierungsmechanismus bot. Unsere Auszubildende setzte das prompt um, während eilig RFCs studiert und nach einer tragfähigen Lösung gesucht wurde. Diese bietet die GNU IDN Library <a target="_blank" href="http://www.gnu.org/software/libidn/">Libidn</a>. Die kann zwar auch noch kein &quot;&szlig;&quot;, ist aber fix gepatcht. Der Code liegt in C, Java und C# vor, da ist also für jeden etwas dabei.
</p>

<p>
<img src="http://www.autobugfix.com/2010/10/31/nameprep1.jpg" alt="LibIDN source code" />
</p>
]]>
    </content>
</entry>

<entry>
    <title>PostScript: the beauty of thinking backwards</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/09/postscript-the-beauty-of-thinking-backwards.html" />
    <id>tag:www.autobugfix.com,2010://3.18</id>

    <published>2010-09-20T22:46:38Z</published>
    <updated>2013-04-04T17:14:13Z</updated>

    <summary>Lately, while snooping around within old coding stuff, I rediscovered an ancient treasure: the PostScript programming language. It&apos;s older than dirt and situated deep down under the bottom of Adobe&apos;s PDF technology. Intended to be an efficient and highly flexible printer  instruction language, it&apos;s actually a Turing-complete programming language, based on a very clear and suprisingly simple stack-oriented concept, outputting geometric results to suitable devices.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="etc." scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postscript" label="PostScript" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<!--
PostScript: the beauty of thinking backwards -->
<p>
Lately, while snooping around within old coding stuff, I rediscovered
an ancient treasure: <em>the PostScript programming language</em>.
It's older than dirt and situated deep down under the bottom of Adobe's PDF
technology. Intended to be an efficient and highly flexible printer 
instruction language, it's actually a Turing-complete programming language,
based on a very clear and suprisingly simple stack-oriented concept,
outputting geometric results to suitable devices.
</p>
]]>
        <![CDATA[<p>
Everything is on a stack here.
That is, operators reduce items on the operand stack,
where operands are objects of various type.
Thus, calculations are denoted in reverse polish notation:
</p>
<collapsible>
<pre class="code">
3 1 sub 2 div                             % (3 - 1) / 2
</pre>
</collapsible>
<p>
or, equivalently, and even more polish ;)
</p>
<collapsible>
<pre class="code">
2 3 1 sub exch div                        % (3 - 1) / 2
</pre>
</collapsible>
<p>
where <span class="code">exch</span> exchanges the two topmost operands.
PostScript procedures are <em>executable arrays</em>, i.e. objects consisting of a series of
operators and operands, enclosed in braces:
</p>
<collapsible>
<pre class="code">
{ 2 div } 3 1 sub exch exec               % (3 - 1) / 2
</pre>
</collapsible>
<p>
Executable objects may be executed at any time using the
<span class="code">exec</span> operator.
Hence, functional concepts such as lambda expressions and lazy
evaluation immediately suggest themselves.
A complete script using certain font and graphic operators might look as follows:
</p>
<collapsible>
<pre class="code">
%!PS-Adobe-3.0 EPSF-3.0
%%BoundingBox: 0 0 170 50

% load standard font
24 /TimesBold findfont exch scalefont setfont
.25 setlinewidth
1 setgray                                 % white

gsave                                     % push graphic state
{                                         % push lambda expression
  gsave                                   % push graphic state
  (transformations) dup                   % duplicate string
  0 0 moveto show                         % print filled
  0 setgray                               % set black
  0 0 moveto true charpath stroke         % print outlined
  grestore                                % pop graphic state
} 
.6 .05 1.1 {                              % loop condition
  setgray                                 % using loop iterator
  .8 dup 13 dup translate scale           % matrix transformation
  -4 rotate
  dup exec                                % execute expression
} for                                     % loop operator
grestore                                  % pop graphic state
exec                                      % finally pop and execute

0 setgray                                 % black
[-1 0 0 1 150 30] concat                  % matrix transformation
(affine) 0 0 moveto show                  % print

showpage
%%EOF
</pre>
<img src="http://source.beta-blog.net/blog/2010/09/affine1.jpg" alt="affine transformations" style="width:600px;height:189px" />
</collapsible>
<p>
Rendering is globally controlled by a transformation matrix, defining an
affine transformation (i.e. a linear mapping plus translation) that can be
modified either by the <span class="code">translate</span>, <span class="code">scale</span>, and
<span class="code">rotate</span> operators or by
explicit left-sided matrix multiplication using the <span class="code">concat</span> operator.
</p>
<p>
A named procedure in the usual manner is obtained as an <em>executable literal</em>
that can be stored within the current dictionary.
That is, the <span class="code">def</span> operator is used to manage the symbol table:
</p>
<collapsible>
<pre class="code">
/div2 { 2 div } def
3 1 sub div2                              % (3 - 1) / 2
</pre>
</collapsible>
<p>
Now, as an apprentice piece of recursive geometry, let's see how
<a href="http://en.wikipedia.org/wiki/Koch_snowflake" target="_blank">Koch's famous Snowflake</a> works:
</p>
<collapsible>
<pre class="code">
% t s koch =&gt; -
/koch {
  dup depth le {     % s &lt; depth
    1 add exch % s =&gt; s+1
    3 div exch % t =&gt; t/3

    % prepare stack:
    % t/3 s+1 60 t/3 s+1 240 t/3 s+1 60 t/3 s+1
    60 3 copy 180 add 3 copy 180 sub 3 copy pop
    koch rotate koch rotate koch rotate koch
  } 
  { pop 0 rlineto }         % draw
  ifelse
} def
</pre>
<img id="koch_img" src="http://source.beta-blog.net/blog/2010/09/koch_5.jpg" alt="kochaffine transformations" style="width:582px;height:181px" />
</collapsible>
<script type="text/javascript">/*<![CDATA[*/
var koch_img_c=3,koch_img_src = new Array();
for (var i=0;i<7;i++)
  (koch_img_src[i]=new Image()).src='http://source.beta-blog.net/blog/2010/09/koch_'+i+'.jpg';
xLib.onLoad(function(){
  window.setInterval(function(){
    xLib.$('koch_img').src=koch_img_src[(koch_img_c++)%7].src;
  },1000);
})/*]]&gt;*/</script>
<p>
The <span class="code">koch</span> function reduces a step and a length parameter
from the stack, while <span class="code">depth</span> is defined in the global dictionary.
The decision of whether dive into recursion incrementing the step value 
or stop and draw a line segment is made by help of the <span class="code">ifelse</span> operator.
So,
</p>
<collapsible>
<pre class="code">
/depth 5 def
newpath 0 0 moveto 100 1 koch stroke
</pre>
</collapsible>
<p>
displays the curve of length 100 with a recursion depth of 5.
</p>
<p>
For reference, see the <a target="_blank" href="http://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf">PostScript Language Reference Manual</a>,
the <a target="_blank" href="http://www-cdf.fnal.gov/offline/PostScript/BLUEBOOK.PDF">Blue Book</a>, and also <a href="http://www.rightbrain.com/download/books/ThinkingInPostScript.pdf" target="_blank">Thinking In PostScript</a> by Glenn Reid.
</p>]]>
    </content>
</entry>

<entry>
    <title>Perl module of the day</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/08/perl-module-of-the-day-1.html" />
    <id>tag:www.autobugfix.com,2010://3.19</id>

    <published>2010-08-12T23:34:49Z</published>
    <updated>2013-04-23T22:56:26Z</updated>

    <summary>Control where you go when you die() ...</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="META" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
Control where you go when you <code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/die.html" target="_blank" rel="help nofollow">die</a>()</code> ... 
</p>
]]>
        <![CDATA[<collapsible>
<pre class="code"><code class="perl"><span class="cmnt">#!/usr/bin/perl</span>
<a class="kwd" href="http://perldoc.perl.org/functions/use.html" target="_blank" rel="help,nofollow">use</a> <a class="pkg" href="http://search.cpan.org/search?query=Religion&amp;mode=module" target="_blank" rel="help,nofollow" title="Module Religion">Religion</a><span class="op stmt">;</span>

<span class="var">$<span class="symb">Die</span>::<span class="symb">Handler</span></span> = <span class="symb">new</span> <span class="symb">DieHandler</span>  <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="help,nofollow">sub</a> <span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/die.html" target="_blank" rel="help,nofollow">die</a> <span class="qlo q"><span class="kwd">q</span><span class="op">/</span><span class="str">Goodbye, cruel world</span><span class="op">/</span></span><span class="op stmt">;</span>
<span class="op rd">}</span><span class="op stmt">;</span>

<span class="symb">__END__</span></code></pre>
</collapsible>
<p>
Well, the <a href="http://search.cpan.org/perldoc?Religion" target="_blank">Religion module</a> will celebrate it's 15th birthday this year!  *dance*
</p>]]>
    </content>
</entry>

<entry>
    <title>xn--bullshit is a valid IDN application</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/07/xn--bullshit-is-a-valid-idn-application.html" />
    <id>tag:www.autobugfix.com,2010://3.20</id>

    <published>2010-07-06T16:54:36Z</published>
    <updated>2013-04-01T14:56:08Z</updated>

    <summary>Indeed, it is! Accidentally, while testing some random stuff against my IDN validatior function, I found out that the word bullshit is the result of the Punycode Algorithm applied to the Unicode sequence U+37F0 U+37E6 U+37F3 U+37EE U+37EC U+37E0.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="etc." scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="domains" label="domains" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
Indeed, it is! Accidentally, while testing some random stuff against my IDN validation function,
I found out that the word
</p>
<pre class="code">
bullshit
</pre>
<p>
is the result of the
<a href="http://en.wikipedia.org/wiki/Punycode" target="_blank">Punycode Algorithm</a>
applied to the <a href="http://en.wikipedia.org/wiki/Unicode" target="_blank">Unicode</a>
sequence
</p>
<pre class="code">
U+37F0 U+37E6 U+37F3 U+37EE U+37EC U+37E0.
</pre>
<p>
These are 6 chinese characters from the
<a href="http://en.wikipedia.org/wiki/CJK_Unified_Ideographs" target="_blank">CJK Unified Ideographs Extension A</a>
block.
I've never seen before that Punycode results in any meaningful word and I think this is
an extremely rare case. So I couldn't help myself to register both
<a href="http://xn--bullshit.com" target="_blank">xn--bullshit.com</a>
and
<a href="http://xn--bullshit.net" target="_blank">xn--bullshit.net</a>
immediately.
</p>
<p>
<a href="http://xn--bullshit.com" target="_blank"><img src="http://source.beta-blog.net/blog/2010/07/xn--bullshit.png" alt="xn--bullshit.com" /></a>
</p>
<p>
I've no idea what to do with it yet - we'll see :)
</p>
<p>
The discovery was announced on <a href="http://blog.http.net/domains/xn-bullshit/" target="_blank">blog.http.net</a> first.
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Is LINQ functional?</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/03/is-linq-functional.html" />
    <id>tag:www.autobugfix.com,2010://3.21</id>

    <published>2010-03-31T20:11:19Z</published>
    <updated>2013-04-04T17:16:12Z</updated>

    <summary>With it&apos;s 3.5 extensions, the .NET framework started to turn into a really cool looking programming concept, last but not least due to the syntactic sugar of LINQ. A reason for that is surely it&apos;s functional look. Well, as LINQ is integrated into an imperative context, it won&apos;t be ever able to guarantee state-free evaluation as a genuine functional language does. Nevertheless it&apos;s worth to discuss and play around with a few aspects of it in terms of a multiple programming paradigm concept.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term=".NET" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="algorithms" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="c" label="C#" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dotnet" label="dotNET" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="math" label="math" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
With it's 3.5 extensions, the .NET framework started to turn into a really
cool looking programming concept,
last but not least due to the syntactic sugar of
<a href="http://msdn.microsoft.com/en-us/library/bb397676.aspx" target="_blank">LINQ</a>.
A reason for that is surely it's <a href="http://en.wikipedia.org/wiki/Functional_programming" target="_blank">functional</a>
look.
Well, as LINQ is integrated into an imperative context, it won't be ever able to
guarantee state-free evaluation as a genuine functional language does.
Nevertheless it's worth to discuss and play around with a few aspects of it
in terms of a multiple programming paradigm concept.
</p>

]]>
        <![CDATA[<h3>Delegating definitions in C# 3.0</h3>
<p>
Firstly, the concept of
<a href="http://en.wikipedia.org/wiki/First-class_function" target="_blank">first-class functions</a>,
i.e. the invention of the function type, leads to the notion of closures.
So for instance, a constant function such as
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;<span class="kwd builtin">int</span>&gt; <span class="type">i</span> = () =&gt; <span class="num">1</span>;
</code></pre>
<p>
defines something like a readonly variable.
You may get it's value now, later or never,
but you can always be sure that it's value won't be ever changed anywhere in your code.
Hence, you have won a quantum of control over your program by this
weird piece of code.
That's a basic idea of functional programming.
</p>

<p>
The concept of function types leads to higher order
functions, i.e. functions mapping functions to other functions.
Thus, the <a href="http://en.wikipedia.org/wiki/Currying" target="_blank">curry functor</a>,
a key concept in the theory of functional programming,
is regarded:
</p>
<p class="quote">
<span class="math">curry: (X <span class="small">x</span> Y &rarr; Z) &rarr; (X  &rarr; Y  &rarr; Z)</span>
</p>
<p>
That is, for any function <span class="math">f(x,y)</span>, there is a curryied function
<span class="math">curry(f)(x)</span>
taking <span class="math">x</span> to a function <span class="math">g(y) = f(x,y)</span>.
This is now implemented easily in C# using generic types:
</p>
<pre class="code">
<code class="csharpnet"><span class="kwd builtin">static</span> <span class="kwd def">Func</span>&lt;<span class="type">X</span>, <span class="kwd def">Func</span>&lt;<span class="type">Y</span>, <span class="type">Z</span>&gt;&gt; <span class="type">Curry</span>&lt;<span class="type">X</span>, <span class="type">Y</span>, <span class="type">Z</span>&gt;(<span class="kwd def">Func</span>&lt;<span class="type">X</span>, <span class="type">Y</span>, <span class="type">Z</span>&gt; <span class="type">f</span>)
{
  <span class="kwd builtin">return</span> <span class="type">x</span> =&gt; <span class="type">y</span> =&gt; <span class="type">f</span>(<span class="type">x</span>, <span class="type">y</span>);
}
</code></pre>
<p>
(inspired by this <a target="_blank" href="http://jacobcarpenter.wordpress.com/2008/01/02/c-abuse-of-the-day-functional-library-implemented-with-lambdas/">C# abuse of the day</a>).
Well, that's more or less of academic interest, since one would hardly ever replace
<span class="code">x++</span> by
</p>
<pre class="code">
<code class="csharpnet"><span class="type">x</span> = <span class="type">Curry</span>&lt;<span class="kwd builtin">int</span>, <span class="kwd builtin">int</span>, <span class="kwd builtin">int</span>&gt;((<span class="type">a</span>, <span class="type">b</span>) =&gt; <span class="type">a</span> + <span class="type">b</span>)(<span class="num">1</span>)(<span class="type">x</span>); <span class="cmnt">// x++ ;)</span>
</code></pre>
<p>
A slightly more interesting example is the following:
</p>
<pre class="code">
<code class="csharpnet"><span class="cmnt">// using System.Text.RegularExpressions;</span>
<span class="kwd builtin">var</span> <span class="type">grep</span> = <span class="type">Curry</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx" target="_blank" rel="nofollow">Regex</a>, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">string</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">string</span>&gt;&gt;(
  (<span class="type">regex</span>, <span class="type">list</span>) =&gt; <span class="kwd builtin">from</span> <span class="type">s</span> <span class="kwd builtin">in</span> <span class="type">list</span>
                   <span class="kwd builtin">where</span> <span class="type">regex.Match</span>(<span class="type">s</span>).<span class="type">Success</span>
                   <span class="kwd builtin">select</span> <span class="type">s</span>);
<span class="kwd builtin">var</span> <span class="type">grepFoo</span> = <span class="type">grep</span>(<span class="kwd builtin">new</span> <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx" target="_blank" rel="nofollow">Regex</a>(<span class="str">&quot;foo&quot;</span>));
</code></pre>
<p>
Thus, <span class="code">grepFoo</span> will grep all words containing
<code class="csharpnet"><span class="str">&quot;foo&quot;</span></code>
from a wordlist.
Attention should be paid to the fact that with the statement
</p>
<pre class="code">
<code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">fooList</span> = <span class="type">grepFoo</span>(<span class="kwd builtin">new</span> <span class="kwd builtin">string</span>[]{<span class="str">&quot;foo&quot;</span>, <span class="str">&quot;bar&quot;</span>, <span class="str">&quot;foobar&quot;</span>});
</code></pre>
<p>
then there is still no regex applied.
Indeed, <code class="csharpnet"><span class="type">fooList</span></code>
is of type
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">string</span>&gt;</code>
and not yet enumerated at this point.
So the evaluation of the expression is deferred until it's result is needed by another computation
- smells like lazy evaluation.
</p>

<h3>LINQ is not lazy!</h3>
<p>
One of the most important paradigms of functional programming is the concept of
<a href="http://en.wikipedia.org/wiki/Lazy_evaluation" target="_blank">lazy evaluation</a>.
For instance, in a functional language, such as the good old
<a href="http://haskell.org/" target="_blank">Haskell</a>,
an expression such as
</p>
<pre class="code"><code>length [1, 2, 3/0]
</code></pre>
<p>
evaluates to <span class="code">3</span>.
That is, the control system is too lazy to fail on division by zero,
neither at compile time nor on run time, since it doesn't need to know any element
inside the array in order to calculate it's length.
In <em>C#</em> (where you aren't even able to compile an expression such as <span class="code">1/0</span>),
you may let
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">q1</span> = <span class="kwd builtin">from</span> <span class="type">i</span> <span class="kwd builtin">in</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">int</span>&gt;)<span class="kwd builtin">new</span> <span class="kwd builtin">int</span>[] { <span class="num">1</span>, <span class="num">2</span>, <span class="num">3</span> }
         <span class="kwd builtin">select</span> <span class="num">1</span>/(<span class="type">i</span> - <span class="num">3</span>);
</code></pre>
<p>
without getting a run time error.
But this has nothing to do with lazy evaluation, since the query expression isn't evaluated at all at this point
(in contrast to the array definition inside the query), so the query expression is simply treated as a function definition.
However, as soon as an aggregation expression such as
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">int</span> <span class="type">three</span> = <span class="type">q1.Count</span>();
</code></pre>
<p>
is reached, a
<span class="code"><code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.dividebyzeroexception.aspx" target="_blank" rel="nofollow">DivideByZeroException</a></code></span>
will be thrown.
Thus, LINQ evaluates eager here, not lazy.
On the other hand,
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">int</span> <span class="type">two</span> = <span class="type">q1.Take</span>(<span class="num">2</span>).<span class="type">Count</span>();
</code></pre>
<p>
works fine, since the black hole stays unevaluated due to the <code>Take</code> operator.
But, having
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">q2</span> = <span class="kwd builtin">from</span> <span class="type">i</span> <span class="kwd builtin">in</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">int</span>&gt;)<span class="kwd builtin">new</span> <span class="kwd builtin">int</span>[] { <span class="num">1</span>, <span class="num">2</span>, <span class="num">3</span> }
         <span class="kwd builtin">select</span> <span class="num">1</span>/(<span class="type">i</span> - <span class="num">1</span>);
<span class="kwd builtin">int</span> <span class="type">two2</span> = <span class="type">q2.Skip</span>(<span class="num">1</span>).<span class="type">Count</span>();
</code></pre>
<p>
instead, you will - guess what! - catch the exception again.
Thus, in contrast to the <span class="code"><code class="csharpnet"><span class="type">Take</span></code></span> operator,
the <span class="code"><code class="csharpnet"><span class="type">Skip</span></code></span> operator
does iterate through skipped elements and hence evaluates them.
Ok, that's no surprise, since these operators are using the
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx" target="_blank" rel="nofollow">IEnumerator</a></code>
provided by the corresponding
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a></code>.
So, LINQ pretends to be lazy in the way that
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">var</span> <span class="type">p</span> = <span class="type">q2.Reverse</span>();
</code></pre>
<p>
won't be evaluated at this point and thus doesn't fail, wheras
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">int</span> <span class="type">two3</span> = <span class="type">p.Take</span>(<span class="num">2</span>).<span class="type">Count</span>();
</code></pre>
<p>
then throws again the exception even though the evil one shuoldn't be taken here.
</p>
<p>
A functional approach to force lazyness would be to replace value expressions by
constant functions, but the compiler won't accept something like this:
</p>
<pre class="code"><code class="csharpnet"><span class="cmnt">// The type of the expression in the select clause is incorrect.</span>
<span class="cmnt">// Type inference failed in the call to &#039;Select&#039;.</span>
<span class="kwd builtin">var</span> <span class="type">q1_</span> = <span class="kwd builtin">from</span> <span class="type">i</span> <span class="kwd builtin">in</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">int</span>&gt;)<span class="kwd builtin">new</span> <span class="kwd builtin">int</span>[] { <span class="num">1</span>, <span class="num">2</span>, <span class="num">3</span> }
          <span class="kwd builtin">select</span> () =&gt; <span class="num">1</span> / (<span class="type">i</span> - <span class="num">3</span>);
</code></pre>
<p>
Hence, LINQ isn't lazy, but has a smart way to make function definitions
looking like statement expressions.
</p>


<h3>Diving into recursion</h3>
<p>
Remember the famous
</p>
<a href="http://en.wikipedia.org/wiki/Fibonacci_number" target="_blank">Fibonacci numbers</a>:
<p class="quote">
<span class="math">fib<sub>0</sub> = 0, fib<sub>1</sub> = 1, fib<sub>n</sub> = fib<sub>n-1</sub> + fib<sub>n-2</sub>.</span>
</p>
<p>
The sequence starts with
</p>
<p class="quote">
<span class="math">fibs = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...</span>
</p>
<p>
where <span class="math">fibs<sub>100</sub></span> is a number consisting of 21 digits then, so it grows quite fast.
Although one may calculate Fibonacci numbers in constant time using
<a href="http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.html" target="_blank">Binet's formula</a>,
the definition leads to interesting comparisons of different recursion strategies.
</p>

<p>
Well, lets have a
</p>
<pre class="code"><code class="csharpnet"><span class="kwd builtin">delegate</span> <span class="kwd builtin">long</span> <span class="kwd def">Fibonacci</span>(<span class="kwd builtin">int</span> <span class="type">n</span>);
</code></pre>
<p>
A direct translation of the definition into a lambda recursion looks like this:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Fibonacci</span> <span class="type">fib1</span> = <span class="kwd builtin">null</span>; <span class="cmnt">// pre-assigned for use within recursion</span>
<span class="type">fib1</span> = <span class="type">n</span> =&gt; <span class="type">n</span> &lt;= <span class="num">1</span> ? <span class="type">n</span> : <span class="type">fib1</span>(<span class="type">n</span> - <span class="num">1</span>) + <span class="type">fib1</span>(<span class="type">n</span> - <span class="num">2</span>);
</code></pre>
<p>
The funny thing with this implementation is, that the Fibonacci function itself determines it's run time:
It's <span class="math">O(fib<sub>n</sub>)</span>, i.e. lower values will be
recalculated many times again and again in order to get a higher one, due to the lack of an aggregating strategy.
</p>

<p>
Now, in Haskell you may get around this very elegantly by defining an infinitive list:
</p>
<pre class="code"><code>fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
</code></pre>
<p>
The list is inititialized with two elements.
Then, notional, the <code>tail</code> function shifts the first element from the <code>fibs</code> list,
while <code>zipWith (+)</code> creates a new list by adding elements of both
<code>fibs</code> and <code>(tail fibs)</code> with each other then.
But in practice, Haskell is smart and lazy enough to avoid any needless recalculation
of numbers already present in the <code>fibs</code> list.
Thus, the algorithm applied here is the same one a human being would apply spontaneously using a
pencil and a chit of paper. So, it's <span class="math">O(n)</span>.
</p>

<p>
To define an infinitive list in C#, one should
implement the
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a></code>
interface in the way that
the corresponding
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx" target="_blank" rel="nofollow">IEnumerator</a></code>
expands the list on demand within it's
<code class="csharpnet"><span class="type">MoveNext</span>()</code>
method then.
Here, it's enough to have a little inliner,
taking a list and an expanding function to a
<code class="csharpnet"><span class="kwd def">Fibonacci</span></code> type:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;
  <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;,
  <span class="kwd def">Func</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;&gt;,
  <span class="kwd def">Fibonacci</span>&gt; <span class="type">infList</span> = <span class="kwd builtin">null</span>;
<span class="type">infList</span> = (<span class="type">list</span>, <span class="type">exp</span>) =&gt; <span class="type">n</span> =&gt; <span class="type">n</span> &lt; <span class="type">list.Count</span>() ?
  <span class="type">list.Skip</span>(<span class="type">n</span>).<span class="type">First</span>() : <span class="type">infList</span>(<span class="type">exp</span>(<span class="type">list</span>), <span class="type">exp</span>)(<span class="type">n</span>);
</code></pre>
<p>
Now, C# also provides a <code>Zip</code> function.
So, a simple syntactic translation of the Haskell list would look like this:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;&gt; <span class="type">fibZip</span> = <span class="type">fibs</span> =&gt;
  <span class="type">fibs.Take</span>(<span class="num">2</span>).<span class="type">Concat</span>(<span class="type">fibs.Zip</span>(<span class="type">fibs.Skip</span>(<span class="num">1</span>), (<span class="type">x</span>, <span class="type">y</span>) =&gt; <span class="type">x</span> + <span class="type">y</span>));
</code></pre>
<p>
Hm, but this one is even worse than the naive recursion.
Indeed, trying
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Fibonacci</span> <span class="type">fib2</span> = <span class="type">infList</span>(<span class="kwd builtin">new</span> <span class="kwd builtin">long</span>[] { <span class="num">0</span>, <span class="num">1</span> }, <span class="type">fibZip</span>);
</code></pre>
<p>
then, you will see that aggregation doesn't work at all this way, since the concept
of enumeration is not functional.
We may repair the <code>fibZip</code> as follows:
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Func</span>&lt;<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;, <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;&gt; <span class="type">fibZip2</span> = <span class="type">fibs</span> =&gt;
  <span class="type">fibs.Concat</span>((<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a>&lt;<span class="kwd builtin">long</span>&gt;)(<span class="kwd builtin">new</span> <span class="kwd builtin">long</span>[] {
    <span class="type">fibs.Skip</span>(<span class="type">fibs.Count</span>() - <span class="num">2</span>).<span class="type">Sum</span>() }));
</code></pre>
<p>
This one looks a bit weird, since it's not that easy to extend an
<code class="csharpnet"><a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx" target="_blank" rel="nofollow">IEnumerable</a></code>
by one element. Anyway,
</p>
<pre class="code"><code class="csharpnet"><span class="kwd def">Fibonacci</span> <span class="type">fib3</span> = <span class="type">infList</span>(<span class="kwd builtin">new</span> <span class="kwd builtin">long</span>[] { <span class="num">0</span>, <span class="num">1</span> }, <span class="type">fibZip2</span>);
</code></pre>
<p>
indeed does the job in <span class="math">O(n)</span> then,
even though the idea of an infinitive list has lost it's magic this way.
</p>

<h3>Conclusion</h3>
<p>
As expected, neither C# nor LINQ turns out to implement
the paradigms of a functional language.
None  the less, it's really fancy. 8-)
</p>]]>
    </content>
</entry>

<entry>
    <title>getting rid of MT&apos;s permalink file extensions</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/02/getting-rid-of-mts-permalink-file-extensions.html" />
    <id>tag:www.autobugfix.com,2010://3.23</id>

    <published>2010-02-14T00:07:42Z</published>
    <updated>2013-04-11T13:54:21Z</updated>

    <summary>While designing scalable and portable web projects, it&apos;s always a good idea to design hyperlinks independently of the physical file path from where the respective request should be served then.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="http" label="HTTP" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
While designing scalable and portable web projects, it's always a good idea to design hyperlinks independently of the physical file path from where the respective request should be served then.
</p>]]>
        <![CDATA[<p>
For instance, I usually have permalinks looking like this one:
<collapsible><pre>
<a href="http://beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions">http://beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions</a>
</pre></collapsible>
Thus I can decide later on whether I prefer to serve pages as html, shtml, php, or whatsoever - without losing the permanentness of my permalink 
Now, while I have choosen <a target="_blank" href="http://www.movabletype.org/documentation/administrator/publishing/static-and-dynamic-publishing.html">static publishing</a> with HTML extension, MT creates a file such as
<collapsible><pre>
/var/www/beta-blog.net/2010/02/getting-rid-of-mts-permalink-file-extensions.html
</pre></collapsible>
so I can tell Apache how it should map my permalink to the physical file path using
<a target="_blank" href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html">mod_rewrite</a>:
<collapsible><pre>
# DocumentRoot /var/www/beta-blog.net
RewriteEngine on
RewriteBase /
RewriteRule ^([^\.]+[^\/])$ $1.html [L]
</pre></collapsible>
<p>
That is, any request path not containing a dot and not ending with a trailing slash will be rewritten to the according HTML file. (Yes, that way I won't be able to serve permalinks containing dots, but I can live with that ;)
</p>
<p>
Now that's nice, but unfortunately, within MT you are not able to configure the looks of permalinks inedpendently from the file path.
Nobody wants to remove the extension from the actual file, since the same directory might contain some .js, .jpg and other files mapped to their own content types as well. (And I have no idea how to set up an <a target="_blank" href="http://httpd.apache.org/docs/2.0/mod/mod_mime.html#addhandler">AddHandler directive</a> exclusively respecting files without extension.) Therefore, the solution is to let MT add the configured extension to the actual file, but remove the extension from the permalink.
</p>
<p>
A blessing in disguise, one has to hack MT's source code itself, but it's a quite simple change to the <code>archive_url</code> method within MT's Entry module. Thus, applying the following patch to <code>lib/MT/Entry.pm</code> will remove .php/.html extensions from archive links:
</p>
<collapsible><pre>
--- MTOS-5.01-en.orig/lib/MT/Entry.pm   2010-02-14 01:17:24.000000000 +0100
+++ MTOS-5.01-en/lib/MT/Entry.pm        2010-02-14 01:58:58.000000000 +0100
@@ -541,7 +541,11 @@
     my $blog = $entry->blog() || return;
     my $url = $blog->archive_url || "";
     $url .= '/' unless $url =~ m!/$!;
-    $url . $entry->archive_file(@_);
+    #$url . $entry->archive_file(@_);
+    ## --&gt; HACK: remove .php/.html extensions &lt;-- ##
+    my $f = $entry->archive_file(@_);
+    $f =~ s/\.(php|html)$//;
+    $url . $f;
 }

 sub permalink {
</pre></collapsible>
<p>
Well, thats really a dirty hack and things will get more complicated with dynamic publishing. So, hopefully one day MT's developers will implement a cleaner solution and make it configurable through the webinterface. *hack*
</p>]]>
    </content>
</entry>

<entry>
    <title>¡La Gomera!</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2010/01/la-gomera.html" />
    <id>tag:www.autobugfix.com,2010://3.24</id>

    <published>2010-01-09T16:35:09Z</published>
    <updated>2013-04-11T13:59:23Z</updated>

    <summary>uno de esos lugares en donde se entiende: este planeta es el muy más hermoso del universo.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="META" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[uno de esos lugares en donde se entiende: este planeta es el muy más hermoso del universo. 8-)
<br /><br />
]]>
        <![CDATA[<img src="http://www.autobugfix.com/2010/01/09/lagomera1.jpg" alt="¡La Gomera!" style="width:640px;height:480px;" />]]>
    </content>
</entry>

<entry>
    <title>AutoSmileys:-) Plugin for Movable Type</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/12/autosmileys-plugin-for-movable-type.html" />
    <id>tag:www.autobugfix.com,2009://3.25</id>

    <published>2009-12-06T16:13:41Z</published>
    <updated>2013-04-28T21:34:47Z</updated>

    <summary>Freely configurable automatic raplacement of textual emoticons by image tags. AutoSmileys is an easy to use and highly customizable macro environment for Movable Type. It will replace self-defined text abbreviations by image tags when your site is published or dynamically rendered, respectively. It may be applied either within entries, comments, pages, or any other part of your site.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="regex" label="RegEx" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
<em>AutoSmileys</em> is an easy to use and highly customizable macro environment
for Movable Type. It will replace self-defined text	abbreviations by image tags
when your site is published or dynamically rendered, respectively. It may be applied
either within entries, comments, pages, or any other part of your site.
</p>
]]>
        <![CDATA[<p>
<em>AutoSmileys</em> works with both static and dynamic publishing and has been
tested with MT 4.2, MT 4.3, and MT 5.0 beta (Perl 5.8.8 / PHP 5.2.9).
(Dynamic publishing with AutoSmileys requires PHP5 in either case.)
</p>

<h4>Download</h4>
<p>
<a href="http://source.beta-blog.net/autosmileys/0.9/AutoSmileys-0.9.zip">http://source.beta-blog.net/autosmileys/0.9/AutoSmileys-0.9.zip</a>
</p>

<h4>Installation</h4>
<ol>
<li>
Download <em>AutoSmileys-0.9.zip</em>, unzip it and copy the <em>AutoSmileys</em>
directory from the zip file into the plugins directory of your Movable Type installation.
</li>
<li>
Sign in to your Movable Type CMS.
</li>
<li>
For each blog you wish to use AutoSmileys in, select <em>Design &gt; Templates</em> from the menu and open
the desired template for editing.
</li>
<li>
Within each of these templates, enclose the desired parts with the
<span class="code">&lt;mt:AutoSmileys&gt; ... &lt;/mt:AutoSmileys&gt;</span> block tag.
</li>
<li>
Republish your site.
</li>
</ol>
<p>
So for instance, if you wish to have <em>AutoSmileys</em> replacement on your entry main body,
edit the <em>Entry</em> template as follows:
</p>
<collapsible>
<p style="text-align:center">
<img alt="settings1.jpg" src="http://beta-blog.net/2009/12/06/template1.jpg" style="width:477px;height:275px;" />
</p>
</collapsible>
<p>
Anyway you may also put the whole HTML body inside the <em>AutoSmileys</em> tag.
This would also apply <em>AutoSmileys</em> to comments, since the <em>Comments</em>
module template is included into the <em>Entry</em> template.
</p>
<p>
For general installation instructions concerning Movable Type, see
<a href="http://www.movabletype.org/documentation/installation/" target="_blank">http://www.movabletype.org/documentation/installation/</a>.
</p>

<h4>Configuration</h4>
<p>
Once you have installed <em>AutoSmileys</em>, sign in to your Movable Type CMS and open
the <em>Tools &gt; Plugins</em> page. Whithin the <em>AutoSmileys 0.9</em> panel, expand the
<em>Settings</em> tab. By default, it will look as follows:
</p>
<collapsible>
<p style="text-align:center">
<img alt="settings1.jpg" src="http://beta-blog.net/2009/12/06/settings1.jpg" style="width:588px;height:701px;" />
</p>
</collapsible>
<p>
On the top of the form you may define HTML tag names whose content will be ignored
in order to avoid inappropriate image tag placement.
The list beneath consists of text abbreviations (tokens)
and associated image source URLs.
</p>
<p>
Default smileys are taken from
<a href="http://www.freesmileys.org" target="_blank">freesmileys.org</a>. *thx*
</p>
<p>
You may change these entries as you like and you may also expand or shorten the list
by clicking on the <span class="code">[+]</span> and <span clas="code">[-]</span>
links on the right hand side (JavaScript required).
</p>
<p>
The list may be expanded with up to 40 rows by default.
You may increase the related parameter
<span class="code">maxRows</span>
by editing <span class="code">AutoSmileys.pl</span>. *hack*
</p>

<h4>Remarks</h4>
<p>
Remember that MT tags are applied after any text filter you may have set up for
entries or comments.
Hence, text filters such as <em>Textile</em> and <em>Markdown</em> would
replace stuff like <span class="code">&#42;lol&#42;</span> before
<em>AutoSmileys</em> has a chance to see it.
</p>
<p>
Also, note that tokens are recognized only in the way you have defined them
after standard HTML entity replacement has been applied.
That is, <span class="code">&amp;#58;&amp;#45;&amp;#41;</span>
will be displayed as <span class="code">&#58;&#45;&#41;</span> but not replaced.
</p>

<h4>How it works</h4>
<p>
The heart of <em>AutoSmileys</em> is a regular expression, dynamically created from the
mappings of tokens to URLs defined as above. Once having these mappings collected within a Perl hash
called <span class="code"><code class="perl"><span class="var">%<span class="symb">mappings</span></span></code></span>, it's built as follows:
</p>
<collapsible>
<pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">re_pattern</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/join.html" target="_blank" rel="nofollow">join</a> <span class="str">&#039;|&#039;</span>,
  <span class="qlo q"><span class="kwd">q</span><span class="op">/</span><span class="str">&lt;(!)--(?:.|\n)*?--&gt;</span><span class="op">/</span></span>,          <span class="cmnt"># markup comments</span>
  <span class="qlo q"><span class="kwd">q</span><span class="op">/</span><span class="str">&lt;([^\s&gt;]+)[^&gt;]*?(\/)?\s*&gt;</span><span class="op">/</span></span>,    <span class="cmnt"># markup tags</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a><span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a>, <a class="kwd" href="http://perldoc.perl.org/functions/keys.html" target="_blank" rel="nofollow">keys</a> <span class="var">%<span class="symb">mappings</span></span><span class="op rd">)</span><span class="op stmt">;</span>  <span class="cmnt"># smileys</span></code></pre>
</collapsible>
<p>
On a reasonably valid markup code this will replace any tokens outside
markup tags, especially ignoring the content of specified tags
inapplicable for containing image tags.
Then,
</p>
<collapsible>
<pre class="code"><code class="perl"><span class="symb">s</span>/<span class="op ld">(</span><span class="var">$<span class="symb">re_pattern</span></span><span class="op rd">)</span>/&amp;<span class="symb">re_callback</span><span class="op ld">(</span><span class="var">$1</span>,<span class="var">$2</span>,<span class="var">$3</span>,<span class="var">$4</span><span class="op rd">)</span>/<span class="symb">eg</span><span class="op stmt">;</span></code></pre>
</collapsible>
<p>
does the job, using the following callback function:
</p>

<collapsible>
<pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">expect</span></span><span class="op stmt">;</span>
<a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">re_callback</span>
<span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">match</span></span>, <span class="var">$<span class="symb">comment</span></span>, <span class="var">$<span class="symb">tagname</span></span>, <span class="var">$<span class="symb">selfclosed</span></span><span class="op rd">)</span> = <span class="var">@_</span><span class="op stmt">;</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span> <span class="kwd">if</span> <span class="var">$<span class="symb">comment</span></span> || <span class="var">$<span class="symb">selfclosed</span></span><span class="op stmt">;</span>
  <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">tagname</span></span> <span class="op rd">)</span> <span class="cmnt"># non-self-closing markup tag</span>
  <span class="op ld">{</span>
    <span class="var">$<span class="symb">tagname</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/lc.html" target="_blank" rel="nofollow">lc</a> <span class="var">$<span class="symb">tagname</span></span><span class="op stmt">;</span> <span class="cmnt"># ignore case</span>
    <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">expect</span></span> <span class="op rd">)</span>  <span class="cmnt"># within ignorance state</span>
    <span class="op ld">{</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/undef.html" target="_blank" rel="nofollow">undef</a> <span class="var">$<span class="symb">expect</span></span> <span class="kwd">if</span> <span class="var">$<span class="symb">tagname</span></span> <a class="kwd" href="http://perldoc.perl.org/functions/eq.html" target="_blank" rel="nofollow">eq</a> <span class="var">$<span class="symb">expect</span></span><span class="op stmt">;</span> <span class="cmnt"># end ignorance state</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span><span class="op stmt">;</span>
    <span class="op rd">}</span>
    <span class="var">$<span class="symb">expect</span></span> = <span class="qlo qq"><span class="kwd">qq</span><span class="op">&lt;</span><span class="istr">/<span class="var">$<span class="symb">tagname</span></span></span><span class="op">&gt;</span></span>
      <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/exists.html" target="_blank" rel="nofollow">exists</a> <span class="var">$<span class="symb">ignoretags</span><span class="op ld">{</span><span class="var">$<span class="symb">tagname</span></span><span class="op rd">}</span></span><span class="op stmt">;</span> <span class="cmnt"># begin ignorance state</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span><span class="op stmt">;</span>
  <span class="op rd">}</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$<span class="symb">match</span></span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">expect</span></span> || !<span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/exists.html" target="_blank" rel="nofollow">exists</a> <span class="var">$<span class="symb">mappings</span><span class="op ld">{</span><span class="var">$<span class="symb">match</span></span><span class="op rd">}</span></span><span class="op rd">)</span><span class="op stmt">;</span>
  &amp;<span class="op ld">{</span><span class="var">$<span class="symb">Defaults</span><span class="op ptr">-&gt;</span><span class="op ld">{</span><span class="str">image_tag</span><span class="op rd">}</span></span><span class="op rd">}</span><span class="op ld">(</span><span class="var">$<span class="symb">mappings</span><span class="op ld">{</span><span class="var">$<span class="symb">match</span></span><span class="op rd">}</span></span>, <span class="var">$<span class="symb">match</span></span><span class="op rd">)</span><span class="op stmt">;</span>
<span class="op rd">}</span></code></pre>
</collapsible>
<p>
Here, <span class="code"><code clas="perl"><span class="var">$<span class="symb">Defaults</span><span class="op ptr">-&gt;</span><span class="op ld">{</span><span class="str">image_tag</span><span class="op rd">}</span></span></code></span>
is the actual image tag function.
That's it 8-).
</p>
]]>
    </content>
</entry>

<entry>
    <title>a wordlist folding algorithm</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/11/a-wordlist-folding-algorithm.html" />
    <id>tag:www.autobugfix.com,2009://3.26</id>

    <published>2009-11-28T23:33:32Z</published>
    <updated>2013-04-18T15:53:32Z</updated>

    <summary>Assumed you wish to match a large wordlist against a huge chunk of text. As a small test case, let
for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar
be your wordlist. Now, you may apply the according regualar expression:
But which way a regex engine would implement the assignment?</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="algorithms" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="codes" label="codes" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="regex" label="RegEx" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
Assumed you wish to match a large wordlist against a huge chunk of text.
As a small test case, let
</p>
<pre class="code">
for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar
</pre>
<p>
be your wordlist. 
</p>
]]>
        <![CDATA[<p>
Now, you may apply the according regualar expression:
</p>
<collapsible>
<pre class="code">
(1) /\b(for|far|bar|foo|boofaz|boofar|boof|faz|foobaz|foobars|boofar)\b/
</pre>
</collapsible>
<p>
But which way a regex engine would implement the assignment?
There are different options. The very worst algorithm would be surely to
look up every word separately in the whole text. That would be the same as
doing
</p>
<collapsible>
<pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/foreach.html" target="_blank" rel="nofollow">foreach</a> <span class="op ld">(</span><span class="qlo qw"><span class="kwd">qw</span><span class="op">(</span><span class="istr"> for far bar foo boofaz boofar boof faz foobaz foobars boofar </span><span class="op">)</span></span><span class="op rd">)</span>
<span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="istr">&quot;matching!&quot;</span> <span class="kwd">if</span> <span class="var">$<span class="symb">text</span></span> =~ <span class="symb">m</span>/\<span class="symb">b</span><span class="var">$_</span>\<span class="symb">b</span>/<span class="op stmt">;</span>
<span class="op rd">}</span>
<a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="istr">&quot;not matching.&quot;</span><span class="op stmt">;</span></code></pre>
</collapsible>
<p>
Assumed you would match <span class="math">m</span> words against a text consisting of <span class="math">n</span> letters,
this peace of coding horror would have a runtime estimation of <span class="math">O(m*n)</span>.
</p>

<p>
Now, a better approach would be to run only once through the text,
using a matching stack. Thus, assume <span class="code">&quot; foobar &quot;</span> would appear somewhere in
the text, the stack trace might look as follows then (read from bottom to top):
</p>
<collapsible>
<pre class="code">
[7] ' ' =&gt; nothing matches.
[6] 'r' =&gt; &quot;foobars&quot; might match.
[5] 'a' =&gt; &quot;foobaz&quot; or &quot;foobars&quot; might match.
[4] 'b' =&gt; &quot;foobaz&quot; or &quot;foobars&quot; might match.
[3] 'o' =&gt; &quot;foo&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[2] 'o' =&gt; &quot;for&quot;, &quot;foo&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[1] 'f' =&gt; &quot;for&quot;, &quot;far&quot;, &quot;foo&quot;, &quot;faz&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[0] ' ' =&gt; &quot;\b&quot; matches.
</pre>
</collapsible>
<p>
So, but what if the wordlist is getting large? It seems that we should run nearly
through the whole list each time a character is pushed onto the stack in order to
find out whether the current stack contents still may be matched or not.
</p>

<p>
It's clear that a considerable optimization would be to sort the word list
in advance. Moreover, instead of looking up one item after another,
a really smart approach would be to walk downwards a search tree instead.
As a tree, the wordlist above would appear like this:
</p>
<collapsible>
<pre class="code">
          _____________|_____________
          |                         |
          b                         f
    ______|______        ___________|___________
    |           |        |                     |
   oof          ar       a                     o
    |                 ___|___            ______|______
    a ?               |     |            |           |
 ___|___              r     z            o           r
 |     |                                 |
 r     z                                 ba ?
                                     ____|____
                                     |       |
                                     rs      z
</pre>
</collapsible>
<p>
Here, the &quot;?&quot; denotes an optional node. Remember the length of the way downwards
such a tree is in logarithmic relation to the number of nodes. Thus, loosely speeking,
we have improved the worst algorithm above up to <span class="math">O(n*log(m))</span> at least.
</p>
<p>
Actually I'm not sure whether regex engines would apply optimizations like that
when compiling. I guess they do, so it might be needless to replace the regex <span class="code">(1)</span> above
by the optimized version, implementing the sorted tree of alternative and optional nodes:
</p>
<collapsible>
<pre class="code">
(2) /\b(b(?:ar|oof(?:a(?:r|z))?)|f(?:a(?:r|z)|o(?:o(?:ba(?:rs|z))?|r)))\b/
</pre>
</collapsible>
<p>
Nevertheless I couldn't help to create a little Perl routine that folds a wordlist into an
optimized regex. Now, here it is:
</p>
<collapsible>
<pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">foldWordsToRegex</span> <span class="op ld">{</span>

  <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">toString</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="op ld">{</span>
    <span class="cmnt">## node: [ prefix, [ nodes ], opt ]</span>

    <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">prefix</span></span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">)</span> = <span class="var">$<span class="op ld">{</span><span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op rd">}</span></span><span class="op stmt">;</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">rv</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a> <span class="var">$<span class="symb">prefix</span></span><span class="op stmt">;</span>
    <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/ref.html" target="_blank" rel="nofollow">ref</a> <span class="var">$<span class="symb">nodes</span></span> <span class="symb">eq</span> <span class="qlo q"><span class="kwd">q</span><span class="op">|</span><span class="str">ARRAY</span><span class="op">|</span></span> &amp;&amp; <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
    <span class="op ld">{</span>
      <span class="var">$<span class="symb">rv</span></span> .= <span class="str">&#039;(?:&#039;</span>.<span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/join.html" target="_blank" rel="nofollow">join</a> <span class="str">&#039;|&#039;</span>, <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a> <span class="op ld">{</span> <span class="symb">toString</span><span class="op ld">(</span><span class="var">$_</span><span class="op rd">)</span> <span class="op rd">}</span> <span class="var">@$<span class="symb">nodes</span></span><span class="op rd">)</span>.<span class="str">&#039;)&#039;</span><span class="op stmt">;</span>
      <span class="var">$<span class="symb">rv</span></span> .= <span class="str">&#039;?&#039;</span> <span class="kwd">if</span> <span class="var">$<span class="symb">opt</span></span><span class="op stmt">;</span>
    <span class="op rd">}</span>
    <span class="var">$<span class="symb">rv</span></span><span class="op stmt">;</span>
  <span class="op rd">}</span><span class="op stmt">;</span>

  <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">fold</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a><span class="op ld">(</span><span class="var">@_</span><span class="op rd">)</span> <span class="op ld">{</span>

    <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">reduce</span><a class="o" href="o" target="_blank" rel="nofollow">(</a><a class="p" href="p" target="_blank" rel="nofollow">$</a><a class="o" href="o" target="_blank" rel="nofollow">)</a><span class="op stmt">;</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">reduce</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="op ld">{</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">prefix</span></span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">)</span> = <span class="var">$<span class="op ld">{</span><span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op rd">}</span></span><span class="op stmt">;</span>

      <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="kwd">unless</span> <a class="kwd" href="http://perldoc.perl.org/functions/ref.html" target="_blank" rel="nofollow">ref</a> <span class="var">$<span class="symb">nodes</span></span> <span class="symb">eq</span> <span class="qlo q"><span class="kwd">q</span><span class="op">|</span><span class="str">ARRAY</span><span class="op">|</span></span> &amp;&amp; <span class="var">@$<span class="symb">nodes</span></span> &gt; <span class="num">1</span><span class="op stmt">;</span>

      <span class="cmnt">## 1st char of the prefix of 1st node in list</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">c</span></span>, <span class="var">$<span class="symb">qc</span></span><span class="op rd">)</span><span class="op stmt">;</span>

      <span class="cmnt">## check whether 2nd prefix starts with same letter as the 1st</span>
      <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <span class="var">$<span class="symb">c</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/substr.html" target="_blank" rel="nofollow">substr</a> <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span>, <span class="num">0</span>, <span class="num">1</span><span class="op stmt">;</span>
        <span class="var">$<span class="symb">qc</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a> <span class="var">$<span class="symb">c</span></span><span class="op stmt">;</span>
        <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> =~ <span class="symb">m</span>/^<span class="var">$<span class="symb">qc</span></span>/ <span class="kwd">or</span> <a class="kwd" href="http://perldoc.perl.org/functions/undef.html" target="_blank" rel="nofollow">undef</a> <span class="var">$<span class="symb">c</span></span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="kwd">unless</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">c</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="kwd">unless</span> <span class="var">@$<span class="symb">nodes</span></span> &gt; <span class="num">2</span><span class="op stmt">;</span>

        <span class="cmnt">## try to reduce next list part</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">first</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/shift.html" target="_blank" rel="nofollow">shift</a> <span class="var">@$<span class="symb">nodes</span></span><span class="op stmt">;</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">next</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span><span class="str">&#039;&#039;</span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">first</span></span>, <span class="var">$<span class="symb">next</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">]</span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op stmt">;</span>

        <span class="cmnt">## couldn&#039;t be reduced</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">first</span></span>, <span class="var">$<span class="op ld">{</span><span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span></span><span class="op rd">}</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="cmnt">## reduce any ensuing node whose prefix starts with $c</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">@<span class="symb">new</span></span><span class="op stmt">;</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">newopt</span></span> = <span class="num">0</span><span class="op stmt">;</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/while.html" target="_blank" rel="nofollow">while</a> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> =~ <span class="symb">s</span>/^<span class="var">$<span class="symb">qc</span></span>// <span class="kwd">or</span> <a class="kwd" href="http://perldoc.perl.org/functions/last.html" target="_blank" rel="nofollow">last</a><span class="op stmt">;</span>

        <span class="cmnt">## reduce node or detect new optional node</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">n</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/shift.html" target="_blank" rel="nofollow">shift</a> <span class="var">@$<span class="symb">nodes</span></span><span class="op stmt">;</span>
        <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">n</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="op rd">)</span>
        <span class="op ld">{</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/push.html" target="_blank" rel="nofollow">push</a> <span class="var">@<span class="symb">new</span></span>, <span class="var">$<span class="symb">n</span></span><span class="op stmt">;</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/next.html" target="_blank" rel="nofollow">next</a><span class="op stmt">;</span>
        <span class="op rd">}</span>
        <span class="var">$<span class="symb">newopt</span></span> = <span class="num">1</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="kwd">if</span> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> || <span class="var">$<span class="symb">opt</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">new</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span> <span class="var">$<span class="symb">c</span></span>, <span class="op ld">[</span> <span class="var">@<span class="symb">new</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">newopt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
        <span class="kwd">if</span> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
        <span class="op ld">{</span>
          <span class="cmnt">## reduce remaining nodes</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">next</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span><span class="str">&#039;&#039;</span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">$<span class="symb">next</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">]</span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op stmt">;</span>

          <span class="cmnt">## couldn&#039;t be reduced</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">$<span class="op ld">{</span><span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span></span><span class="op rd">}</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
        <span class="op rd">}</span>

        <span class="cmnt">## current node is optional</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="cmnt">## nothing left to reduce</span>
      <span class="symb">reduce</span> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>.<span class="var">$<span class="symb">c</span></span>, <span class="op ld">[</span> <span class="var">@<span class="symb">new</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">newopt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
    <span class="op rd">}</span><span class="op stmt">;</span>

    <span class="symb">reduce</span> <span class="op ld">[</span> <span class="str">&#039;&#039;</span>, <span class="op ld">[</span><span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a> <span class="op ld">{</span> <span class="op ld">[</span><span class="var">$_</span><span class="op rd">]</span> <span class="op rd">}</span> <a class="kwd" href="http://perldoc.perl.org/functions/sort.html" target="_blank" rel="nofollow">sort</a> <span class="var">@_</span> <span class="op rd">)</span><span class="op rd">]</span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
  <span class="op rd">}</span><span class="op stmt">;</span>

  <span class="symb">toString</span><span class="op ld">(</span><span class="symb">fold</span><span class="op ld">(</span><span class="var">@_</span><span class="op rd">)</span><span class="op rd">)</span><span class="op stmt">;</span>
<span class="op rd">}</span><span class="op stmt">;</span></code></pre>
</collapsible>
<p>
Well, not so easy, but it works :)
</p>
<p>
Here, the inner recursion <span class="code">fold</span> will create the actually tree, where nodes
having the form of arrays consisting of prefix, subnodes and a flag denoting optional nodes.
The second inner function <span class="code">toString</span> then creates the actual regular
expression string from that tree.
So, for instance, calling
</p>
<collapsible>
<pre class="code"><code class="perl">&amp;<span class="symb">foldWordsToRegex</span><span class="op ld">(</span><span class="qlo qw"><span class="kwd">qw</span><span class="op">(</span><span class="istr"> for far bar foo boofaz boofar boof faz foobaz foobars boofar </span><span class="op">)</span></span><span class="op rd">)</span></code></pre>
</collapsible>
<p>
would return the regex <span class="code">(2)</span>.
</p>]]>
    </content>
</entry>

<entry>
    <title>understanding unicode surrogates / or: how to deal with Linear B strings in .NET</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/11/understanding-unicode-surrogates-or-how-to-deal-with-linear-b-strings-in-net.html" />
    <id>tag:www.autobugfix.com,2009://3.7</id>

    <published>2009-11-17T20:23:58Z</published>
    <updated>2013-05-01T21:11:10Z</updated>

    <summary>Remember a String object in .NET is a collection of Char objects, where a Char object in turn s announced as a unicode character, encoded by a 16bit unsigned integer. Thus, more precisely speaking, a single Char object is able to encode any codepoint within the basic multilingual lane (BMP), i.e. between U+0000 and U+FFFF. So, where goes the rest of the story? Unicode, as an universal character set, is designed to support much more than 65536 characters of ourse.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term=".NET" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="idna" label="IDNA" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="math" label="math" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
Remember a <span class="code cs1"><span class="ob">String</span></span> object
in .NET is a collection of <span class="code cs1"><span class="ob">Char</span></span>
objects, where a <span class="code cs1"><span class="ob">Char</span></span> object
in turn is announced as a
<a href="http://en.wikipedia.org/wiki/Unicode" target="_blank">unicode character</a>,
encoded by a 16bit unsigned integer.
</p>
]]>
        <![CDATA[<p>
Thus, more precisely speaking, a single <span class="code cs1"><span class="ob">Char</span></span>
object is able to encode any codepoint within the
<a href="http://en.wikipedia.org/wiki/Mapping_of_Unicode_character_planes#Basic_Multilingual_Plane" target="_blank">basic multilingual plane (BMP)</a>,
i.e. between <span class="code">U+0000</span> and <span class="code">U+FFFF</span>.
So, where goes the rest of the story? Unicode, as an universal character set,
is designed to support much more than 65536 characters of course.
</p>
<p>
Now, the trick is to encode code points above <span class="code">2<sup>16</sup></span>
by so-called surrogates, that is, by pairs of 16bit integers.
To see how this works, remember the well-known
<a href="http://en.wikipedia.org/wiki/Division_algorithm" target="_blank">division algorithm</a>
for integers. That is, if you have an upper bound <span class="math">M</span> and
fix an integer constant <span class="math">C (0 &lt; C &lt; M)</span>,
for any integer <span class="math">N</span> within the range of
<span class="math">0 &le; N &lt; 2<sup>M</sup></span>,
there exists a unique pair of integers <span class="math">H,L</span>, such that
</p>
<p class="quote">
<span class="math">N = 2<sup>C</sup> * H + L,</span> where <span class="math">0 &le; L &lt; 2<sup>C</sup></span> and <span class="math">0 &le; H &lt; 2<sup>M - C</sup></span>.
</p>
<p>
That way you have simply encoded these <span class="math">2<sup>M</sup></span> numbers
<span class="math">N</span> by <span class="math">2<sup>C</sup> * 2<sup>M - C</sup></span> pairs
of numbers <span class="math">H,L</span>.
Hence <span class="math">2<sup>M</sup></span> large numbers are adressed using a set of
<span class="math">2<sup>C</sup> + 2<sup>M-C</sup></span> small numbers, that's the trick.
</p>

<p>
As we are interested in encoding integers above <span class="math">2<sup>16</sup></span>
by pairs of 16bit integers, we should act on the assumption
</p>
<p class="quote">
<span class="math">2<sup>16</sup> &le; N' &lt; 2<sup>16</sup> + 2<sup>M</sup></span>,
</p>
<p>
dealing with <span class="code">N = N' - 2<sup>16</sup></span> then.
In order to decide whether any 16bit number does belong to a surrogate pair,
playing either the role of <span class="code">H</span> or <span class="code">L</span>,
finally fix an adequate constant <span class="code">T</span> and set
</p>
<p class="quote">
<span class="math">H' = H + T, L' = L + T + 2<sup>C</sup>,</span>
</p>
<p>
thus having tagged all 16bit integers <span class="math">I</span> achieving
<span class="math">T &le; I &lt; T + 2<sup>C</sup> + 2<sup>M-C</sup></span>
as surrogate integers, where the high surrogates of type <span class="math">H'</span>
are less than <span class="math">T + 2<sup>C</sup></span> and
the ones above are the low surrogates of type <span class="math">L'</span>.
</p>

<p>
Now, the setting of unicode is this: <span class="math">C = 10, M = 20, T = 0xD800</span>.
So, by reserving 2048 small integers as
surrogates, more than a million of additional codepoints up to
<span class="code">U+10FFFF</span> are accessible. The resulting formulars may be found here:
<a href="http://www.unicode.org/book/ch03.pdf" target="_blank">http://www.unicode.org/book/ch03.pdf</a>.
</p>

<p>
Thankfully .NET unicoders don't need to deal with hex numbers at all, because it's
ready made.
For instance, consider the name of
<a href="http://en.wikipedia.org/wiki/Amnisos" target="_blank">Amnissos</a>:
written in <a href="http://en.wikipedia.org/wiki/Linear_B" target="_blank">Linear B</a>:
</p>
<p class="quote">
<img src="http://www.autobugfix.com/2009/11/18/linearb_u10000.gif" alt="U+10000" /><img src="http://www.autobugfix.com/2009/11/18/linearb_u10016.gif" alt="U+10016" /><img src="http://www.autobugfix.com/2009/11/18/linearb_u1001B.gif" alt="U+1001B" /><img src="http://www.autobugfix.com/2009/11/18/linearb_u10030.gif" alt="U+10030" /></p>
<p>
In C# it looks like this:
</p>
<collapsible>
<pre class="code"><code class="csharpnet"><span class="cmnt">// alternatively the Char.ConvertFromUtf32() method may be used</span>
<span class="kwd builtin">string</span> <span class="type">amnisos</span> = <span class="str">&quot;\U00010000&quot;</span> + <span class="str">&quot;\U00010016&quot;</span> + <span class="str">&quot;\U0001001B&quot;</span> + <span class="str">&quot;\U00010030&quot;</span>;</code></pre>
</collapsible>
<p>
Note that indeed the <span class="code cs1"><span class="sym">Length</span></span> property
of the resulting string has a value of 8, while it contains only 4 unicode characters.
So the appropriate way of accessing the actual codepoints of an arbitrary string
should make use of
<span class="code">System.Globalization.TextElementEnumerator</span>
rather than simply access
<span class="code cs1"><span class="ob">Char</span></span> objects greenly.
It goes like this:
</p>
<collapsible>
<pre class="code"><code class="csharpnet"><span class="cmnt">// using System.Globalization;</span>
<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.globalization.textelementenumerator.aspx" target="_blank" rel="nofollow">TextElementEnumerator</a> <span class="type">en</span> = <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx" target="_blank" rel="nofollow">StringInfo</a>.<span class="type">GetTextElementEnumerator</span>(<span class="type">amnisos</span>);
<span class="kwd builtin">while</span> (<span class="type">en.MoveNext</span>())
{
  <span class="kwd builtin">string</span> <span class="type">current</span> = <span class="type">en.GetTextElement</span>();
  <span class="kwd builtin">if</span> (<a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.char.aspx" target="_blank" rel="nofollow">Char</a>.<span class="type">IsSurrogate</span>(<span class="type">current</span>, <span class="num">0</span>))
  {
    <span class="cmnt">// a surrogate pair encoding one character, i.e. current.Length == 2</span>
    <span class="kwd builtin">int</span> <span class="type">codepoint</span> = <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.char.aspx" target="_blank" rel="nofollow">Char</a>.<span class="type">ConvertToUtf32</span>(<span class="type">current</span>[<span class="num">0</span>], <span class="type">current</span>[<span class="num">1</span>]);
    <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.console.aspx" target="_blank" rel="nofollow">Console</a>.<span class="type">WriteLine</span>(<span class="str">&quot;U+{0:X6}&quot;</span>, <span class="type">codepoint</span>);
  }
  <span class="kwd builtin">else</span>
  {
    <span class="cmnt">// characters within BMP:</span>
    <span class="cmnt">// current.Length &gt; 1 may be true in case of combining characters </span>
    <span class="cmnt">// cf. StringInfo.ParseCombiningCharacters()</span>
    <span class="kwd builtin">foreach</span> (<span class="kwd builtin">char</span> <span class="type">c</span> <span class="kwd builtin">in</span> <span class="type">current</span>)
    {
      <span class="kwd builtin">int</span> <span class="type">codepoint</span> = (<span class="kwd builtin">int</span>)<span class="type">current</span>[<span class="num">0</span>]; <span class="cmnt">// use AscW() in VB.NET</span>
      <a class="kwd def" href="http://msdn.microsoft.com/en-us/library/system.console.aspx" target="_blank" rel="nofollow">Console</a>.<span class="type">WriteLine</span>(<span class="str">&quot;U+{0:X4}&quot;</span>, <span class="type">codepoint</span>);
    }
  }
}</code></pre>
</collapsible>
<p>
Now, when we will be able to register Linear B domain names at last? ;)
</p>]]>
    </content>
</entry>

<entry>
    <title>Fun with European domain names</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/11/fun-with-european-domain-names.html" />
    <id>tag:www.autobugfix.com,2009://3.8</id>

    <published>2009-11-12T23:44:36Z</published>
    <updated>2013-04-02T05:13:51Z</updated>

    <summary>Starting 10 December 2009, companies and private persons based in the European
Union will be able to register.eu Internationalised Domain Names</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="etc." scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="codes" label="codes" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="domains" label="domains" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="unicode" label="unicode" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
Starting 10 December 2009, companies and private persons based in the European
Union will be able to register
<a href="http://www.eurid.eu/en/eu-domain-names/idns-eu" target="_blank">.eu Internationalised Domain Names</a>.
The <a href="http://www.eurid.eu/en/eu-domain-names/idns-eu/supported-characters" target="_blank">list of supported characters</a>
is divided into several parts, called IDN scripts,
such as &quot;Latin-1 supplement&quot;, &quot;Greek extended&quot;, &quot;Cyrillic&quot; and the like.
Indeed, I may consider to get
</p>

<pre class="code">
<a href="http://beta-blog.net" target="_blank">http://www.β-ιστός-κούτσουρο.eu</a>
</pre>

<p>
Unfortunately, one cannot mix several scripts, thus β-blog.eu won&#8217;t be a valid name,
since ASCII-letters belong to the Latin script while β is Greek. (Well, so, I think
I&#8217;ll give up that idea ;))
</p>

<p>
To get serious, as an <a href="http://www.http.net" target="_blank">EURID registrar</a>, it&#8217;s time for us to check out several
issues that may apply to IDN requests built up with all that strange letters
Europeans may use.
</p>

<p>
For instance, note that
</p>

<pre class="code">
a1.eu
</pre>

<p>
and
</p>

<pre class="code">
а1.eu
</pre>

<p>
are completely different domain names. But this is just an optical
trick, since the first one starts with an ordinary ASCII &quot;a&quot; while the second
starts with U+0430, wich is the <a href="http://en.wikipedia.org/wiki/Unicode" target="_blank">unicode</a> notation of the cyrillic small letter &quot;a&quot;.
Indeed, when you hit the second one into your browser, it will calculate
the according ACE-string xn&#8212;1-7sb.eu using the <a href="http://en.wikipedia.org/wiki/Punycode" target="_blank">punycode algorithm</a> first and will
make up the DNS request with it.
</p>

<p>
Things are getting more complicated when you notice that
</p>

<pre class="code">
aŀt.eu
</pre>

<p>
on the one hand, and
</p>

<pre class="code">
al·t.eu
</pre>

<p>
on the other hand indeed are the same domain name.
Applying the punycode algorithm to both of them, you will get
</p>

<pre class="code">
xn--at-rqa.eu
</pre>

<p>
for the first one and
</p>

<pre class="code">
xn--alt-mga.eu
</pre>

<p>
for the second one, because both byte streams differ.
Now, something goes wrong here, since when you plan to ask a nameserver for the
IP address to access the domain, you will have to decide for wich one you ask.
Unlikely a nameserver will answer to both of them.
</p>

<p>
Well, according to the IDNA standard as defined in rfc3490, applications
not only have to do a punycode for IDNs, but also have to apply the
nameprep algorithm first, wich in turn consists of several normalization mappings
such as lower case conversion and, more interesting, also the
Unicode Normalization Form KC
(see <a href="http://unicode.org/reports/tr15/" target="_blank">http://unicode.org/reports/tr15/</a>).
The latter is the decomposing of characters by unicode compatibility equivalence.
Thus, the character U+0140, the Latin small letter &quot;l&quot; with middle dot,
decomposes into two unicode characters:
</p>

<pre class="code">
U+0140 =&gt; U+006C + U+00B7
</pre>

<p>
That is, the middle dot will be aparted from the letter &quot;l&quot;. Therefore, xn&#8212;at-rqa.eu
is an application of punycode, but not a conversion in the sense of IDNA standard.
Indeed, you will get different results from different so-called IDN converter libraries
with that domain, depending on whether they ara just doing punycode or applying a proper nameprep
first. A reliable reference is the Verisign conversion tool
(<a href="http://mct.verisign-grs.com/index.shtml" target="_blank">http://mct.verisign-grs.com/index.shtml</a>)
and the according SDK for example (although I wasn&#8217;t able to get the Win32 version working).
</p>

<p>
After all, the challenge for the registrar is to maintain the request database properly,
accepting IDN requests in both the normalized and any equivalent form. And moreover,
one has to check a requested name against the given character list wich contains
non-normalized letters, even if the requested name is normalized already.
Since normalization isn&#8217;t a reversible mapping this may be complicated
in general, but should be solvable in this case.
</p>
]]>
        

    </content>
</entry>

<entry>
    <title>no sleep till DENIC</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/10/no-sleep-till-denic.html" />
    <id>tag:www.autobugfix.com,2009://3.9</id>

    <published>2009-10-25T08:10:52Z</published>
    <updated>2013-04-01T14:56:07Z</updated>

    <summary>Donnerstag, 15.10.2009, 16:14 Uhr.

Ein unspektakulärer Arbeitstag neigt sich dem Ende zu. Im Posteingang erscheint
eine Verlautbarung der DENIC-Mitgliederliste. Von einem BGH-Urteil ist die Rede,
von neuen Domainrichtlinien. Da per Gerichtsbeschluss die Registrierung der
Domain vw.de angeordnet wurde, sollen auch vergleichbare Domainnamen freigegeben
werden, die bisher nicht zugelassen waren: 1- und 2-stellige Domains, Zifferndomains,
sowie Toplevel-Domains und deutsche Autokennzeichen.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="META" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="etc." scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="domains" label="domains" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<h2>Chronologie eines außergewöhnlichen Programmierwettbewerbs</h2>

<h3>Donnerstag, 15.10.2009, 16:14 Uhr</h3>
<p>
Ein unspektakulärer Arbeitstag neigt sich dem Ende zu. Im Posteingang erscheint
eine Verlautbarung der <a href="http://denic.de" target="_blank">DENIC</a>-Mitgliederliste.
Von einem BGH-Urteil ist die Rede, von neuen Domainrichtlinien. Da per Gerichtsbeschluss
die Registrierung der Domain vw.de angeordnet worden sei, sollen auch vergleichbare Domainnamen freigegeben
werden, die bisher nicht zugelassen waren: ein- und zweistellige Domains, Zifferndomains,
sowie existierende Toplevel-Domains und deutsche Autokennzeichen.
</p>
<p>
Das bedeutet: keine Beschränkungen mehr, abgesehen von der Maximallänge und dem
zugelassenen Zeichensatz, sowie der international üblichen Regel, dass Domainnamen
nicht mit einem Minus beginnen oder enden dürfen.
</p>
<p>
Es ist klar, dass unter diesen Namen nicht wenige sind, die man mit fünf- bis
sechsstelligen Beträgen handeln werden wird.
</p>
<p>
Im dritten Absatz offenbart sich dann die eigentliche Brisanz dieser Nachricht.
Man muss zweimal hinschauen: Die Registrierung der neuen Domains ist ab
&quot;25.10.2009 9:00 Uhr (MESZ)&quot; freigegeben. First come – first served.
</p>
<p>
Prompte Nachfrage eines Mitgliedes:
Der 25.10. sei ein Sonntag und außerdem der Tag, an dem um 3:00 Uhr von MESZ auf MEZ
umgechaltet wird, ob das bekannt sei? Und ob wirklich MESZ gemein sei?
</p>

<h3>Donnerstag, 15.10.2009, 16:43 Uhr</h3>
<p>
Achja, es wird offiziell korrigiert:
Starttermin ist Freitag, der 23.10.2009, 9:00 MESZ. Das ist in einer Woche.
</p>
<p>
Währenddessen: der Geschäftsführer erscheint zur spontanen Krisensitzung im Entwickler-Büro.
Wir müssen sehr schnell zwei Dinge gleichzeitig tun:
</p>
<ul>
<li>Einen Landrush für unsere Kunden vorbereiten, damit sie ihre Bestellungen vormerken können.</li>
<li>Unsere Teilnahme an der Vergabe am 23.10. vorbereiten.</li>
</ul>
<p>
Schon während des Gesprächs wird ein Webformular für die Kunden aufgesetzt, zum Vormerken
der gewünschten Domainnamen. Auch hier gilt: First come – first served.
Wir einigen uns auf Freischaltung zum Dienstag, 20.10.2009, 12:00 Uhr MESZ. Ein
Eil-Newsletter ist zu formulieren. Alle anderen Projekte sind zurückzustellen.
</p>

<h3>Freitag, 16.10.2009, 14:43 Uhr</h3>
<p>
Eine erste interne Testversion für das Vormerk-Formular und die zugehörige
Datenbankstruktur ist fertig. Es soll ab Montag für die Kunden sichtbar sein.
</p>
<p>
Währenddessen erfolgen erste Entwicklungsschritte für die Teilnahme am DENIC-Landrush.
Die Registrierung soll per <a href="http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol" target="_blank">SMTP</a> via <a href="http://en.wikipedia.org/wiki/Pretty_Good_Privacy" target="_blank">PGP-signierter</a> E-Mails im MRIv2-Format erfolgen. Wir
benutzen schon lange nicht mehr die SMTP-Schnittstelle der DENIC, sondern
die Echtzeitschnittstelle RRI (XML per dediziertes TCP-Protokoll).
Es gilt also zunächst, MRIv2 zu lernen. Die technische Dokumentation wird studiert:
Wie müssen die Antrags-Mails genau aussehen?
</p>

<h3>Freitag, 16.10.2009, ca. 16:00 Uhr</h3>
<p>
Die ersten MRIv2-Mails werden erfolgreich abgesetzt und verarbeitet, zunächst einfach
Updates für bestehende Test-Domains. Unser PGP-Key funktioniert, man muss aber bei <a href="http://en.wikipedia.org/wiki/Internationalized_domain_name" target="_blank">IDN-Domains</a> mit dem Encoding aufpassen.
</p>
<p>
Wir werden alle Vormerkungen mit festen Kontakt-Handles und einer festen Delegation auf
DENIC-Nameserver absetzen um überflüssige Probleme zu vermeiden. Erfolgreiche
Registrierungen können später auf die gewünschten Eintragungen geändert werden.
Also ist wenigstens schon mal klar, wie die Anträge genau auszusehen haben.
</p>
<p>
Währenddessen ist eine heiße Diskussion auf der Mitgliederliste entbrannt.
Wieso, das BGH-Urteil sei doch schon 29.09.2009 gefällt worden? Achso, es war dem
BGH-Anwalt der DENIC am 15.10.2009 offiziell mitgeteilt worden.
</p>
<p>
Die technische Dokumentation zum geplanten Landrush wird nun beinahe beinahe stündlich
geändert. Allseits herrscht Verwirrung. Wo soll man hinsenden? Was ist mit einer
Test-Umgebung? Wie ist die zu erreichen und ab wann? Und wie genau ist die Policy
für das Einliefern der Mails nun überhaupt zu verstehen? Eine dedizierte IP-Adresse
pro Mitglied und dann maximal 4 Mails pro Minute. Aber was heißt das genau?
</p>

<h3>Montag, 19.10.2009, 10:00 Uhr</h3>
<p>
Unsere dedizierte IP-Adresse für die Teilnahme am Landrush wird angemeldet.
Damit können wir am Testbetrieb teilnehmen.
</p>
<p>
Auf der Mailingliste toben mittlerweile erbitterte Auseinandersetzungen und
verwirrte Nachfragen. Gleichzeitig werden allenthalben juristische Schritte
angedroht, befürchtete Übervorteilungen angeprangert und Details des SMTP-Protokolls
diskutiert.
</p>

<h3>Montag, 19.10.2009, ab 16:00 Uhr</h3>
<p>
In der Dokumentation wird verwirrenderweise explizit auf die Möglichkeit
des SMTP-Pipelinings hingewiesen, obwohl das doch allenfalls dazu dienen
kann, die selbe Mail an verschiedene Empfänger in einem Bulk abzusenden? Das
ist hier sinnlos, denn man müsste stattdessen verschiedene Mails möglichst schnell
an den selben Empfänger absenden. Tatsächlich ergeben erste Tests, dass man
scheinbar nichteinmal mehrere Mails innerhalb der selben TCP-Connection absetzen kann:
</p>
<pre class="code">
&quot;421 too many messages in this connection&quot;
</pre>
<p>
Insgesamt ist unklar, wie genau die <a href="http://en.wikipedia.org/wiki/Access_control_list" target="_blank">ACL</a> funktioniert. Teilweise kann man mehr
als 4 Mails pro Minute absetzen, dann auch wieder weniger, möglicherweise je
nachdem wie viele Versuche man unternimmt.
Eine diesbezüglich Anfrage an den DENIC-Support wartet vergebens auf Antwort.
</p>

<h3>Dienstag, 20.10.2009, 10:00 Uhr</h3>
<p>
Unser Vormerk-Formular wird einem firmeninternen Landrush-Test unterzogen. Keine
technischen Probleme.
</p>

<h3>Dienstag, 20.10.2009, 12:00 Uhr</h3>
<p>
Unser Vormerk-Formular wird für die Kunden freigeschaltet. Innerhalb weniger Sekunden gehen
ca. 25.000 Anfragen ein, während wir ängstlich beobachten, wie die CPU des Webservers
an der Decke klebt. Nach einigen Minuten beruhigt sich die Lage. Wir haben zunächst
ungeprüft alles angenommen, um nur ja nicht berechtigte Anfragen fälschlicherweise abzulehnen.
Das wäre im Nachhinein nicht mehr gut zu machen, da die Reihenfolge dann nicht mehr klar wäre.
</p>
<p>
Nach Bereinigung von Dopplern und ungültigen Domainnamen bleiben ca. 2500 Vormerkungen
übrig.
</p>

<h3>Dienstag, 20.10.2009, 13:39 Uhr</h3>

<p>
Offenbar ist sich DENIC selbst noch nicht völlig im Klaren darüber, was genau mit 4 Mails
pro Minute gemeint sein soll:
</p>
<p class="quote">
&quot;[...] Derzeit arbeiten wir an der Optimierung der Mail-ACL.
Bitte beachtet, dass es dadurch zu Anpassungen kommen kann, die Ihr zeitnah
berücksichtigen müsst. [...]&quot;
</p>
<p>
Unabhängig von der ACL ist jedenfalls klar, dass es im zu implementierenden
Algorithmus für unser Landrush-Tool vor allem auf zwei Dinge ankommt:
</p>
<ul>
<li>Möglichst exakte Startzeit (welcher Zeitpunkt innerhalb des SMTP-Dialogs zählt?)</li>
<li>Während des Laufs möglichst wenig Domains zu beantragen, deren Nichtverfügbarkeit
bereits ermittelt werden kann.</li>
</ul>
<p>
Wir lassen also zwei Threads parallel laufen, einen der versucht, die Mails aus der Queue
(nach welcher Strategie auch immer) abzusenden, und einen, der parallel durch vorausschauendes
Prüfen der Verfügbarkeit erfolglose Anträge markiert.
</p>

<h3>Dienstag, 20.10.2009, 19:42 Uhr</h3>
<p>
DENIC gibt erstmals exakte Informationen über die Realisierung der ACL bekannt. Insbesondere
wurde der Algorithmus vom bisher verwedeten
&quot;<a href="http://en.wikipedia.org/wiki/Sliding_window" target="_blank">Sliding-window Verfahren</a>&quot;,
das auf in undurchsichtiger Weise aufgrund extrapolierter Sendegeschwindigkeit berechneten Zeitfenstern beruhte,
umgestellt auf das sog. &quot;<a href="http://en.wikipedia.org/wiki/Rate_limiting" target="_blank">Rate-limiting Verfahren</a> nach dem <a href="http://en.wikipedia.org/wiki/Token_bucket" target="_blank">Token-bucket Algorithmus</a>&quot;.
Demnach sollen tatsächlich immer maximal 4 Mails pro echter Zeitminute angenommen werden, unabhängig
von der Anzahl der Sendeversuche.
</p>

<h3>Mittwoch, 21.10.2009, 08:00 Uhr</h3>
<p>
Eigentlich wollten wir heute zum technischen Meeting der DENIC nach Frankfurt fliegen.
Zwei Flüge waren lange gebucht und werden nun verfallen. Keine Zeit für Meetings.
</p>

<h3>Mittwoch, 21.10.2009, 08:55 Uhr</h3>
<p>
Wir nehmen erstmals an einem Massentest teil. Wie im geplanten Livebetrieb wird der
Mailserver bereits um 08:55 gestartet, und Mails vor 09:00 Uhr werden zunächst
angenommen und dann mit einer entsprechenden Fehlermeldung beantwortet.
</p>
<p>
Diese Vorgehensweise ist für den Landrush-Programmierer angenehmer als die sonst häufig
angewandte Methode, erst ab Startzeit Connections zu akzeptieren, da man hierbei leicht
in einen Timeout zur Startzeit schlittern kann und sich dieses Verhalten
außerdem nur schwer simulieren lässt. Überhaupt hat dieser Landrush mehr etwas von einem
raffinierten Strategie-Spiel als die sonst üblichen Massensprints, wo es gilt, parallel
mit maximal vielen Connections auf den Server einzuprügeln.
</p>

<h3>Mittwoch, 21.10.2009, 09:00 Uhr</h3>
<p>
Überraschend am Massentest gegenüber dem üblichen Testbetrieb ist zunächst, dass der Mailserver
offenbar vor dem Ansturm von weniger als 300 Mitgliedern mit je einer Connection
ziemlich in die Knie geht. Die 4-Mails/Minute-Policy stellt sich als Farce heraus, da
sich die SMTP-Dialoge dermaßen schleppend hinziehen, dass einzelne Mails schon 20-30
Sekunden brauchen.
Auffallend ist außerdem, dass wiederholt Connections abgelehnt werden:
</p>
<pre class="code">
&quot;421 Too many concurrent SMTP connections from this IP address&quot;,
</pre>
<p>
Obwohl die vorherige Connection durchaus geschlossen und auch die Schließung durch
den Server bestätigt worden war.
</p>

<h3>Mittwoch, 21.10.2009, 10:08 Uhr</h3>
<p>
Die ersten Antwortmails des Registrierungssystems treffen nach knapp einer Stunde ein.
Mitglieder berichten von unerwarteten Antworten wie etwa
</p>
<pre class="code">
&quot;ERROR: 56000000013 Technical error&quot;
</pre>
<p>
und anderen sonderbaren Phänomenen. Die Datenbank scheint auch vor dem Test nicht oder
nur teilweise zurückgesetzt worden zu sein. Irgendwas scheint da gehörig gegen die
Wand zu laufen.
</p>

<h3>Mittwoch, 21.10.2009, 10:08 Uhr</h3>
<p>
DENIC kündigt einen weitereren Massentest für 15:00 Uhr an.
</p>

<h3>Mittwoch, 21.10.2009, ab 11:00 Uhr</h3>
<p>
In diesem Test haben wir keine einzige interessante Domain abbekommen. Wir werden die Zeit bis zum
Nachmittag nutzen, um den zweiten Thread zur Prüfung der Verfügbarkeit hinzuzufügen.
</p>
<p>
Die Verfügbarkeit von Domains ermitteln wir per RRI-Abfrage. Das scheint etwas effizienter
und zuverlässiger als <a href="http://en.wikipedia.org/wiki/Whois" target="_blank">whois</a> zu sein.
Bei noch verfügbaren Domains, die nicht den bisherigen DENIC-Richtlinien entsprechen,
antwortet DENIC mit einem Fehler bzgl. des Domainnamens:
</p>
<pre class="code">
Invalid format of value for keyword &quot;Domain&quot;
</pre>
<p>
Somit interpretieren wir jede Antwort, die nicht besagt, dass die Domain vergeben ist,
als &quot;verfügbar&quot;.
</p>
<p>
Mit Hilfe einer speziellen Test-Tabelle, die zufällig verteilt einerseits die ein- und zweistelligen,
und andererseits garantiert verfügbare Nummerndomains enthält, lässt sich der Parallelbetrieb
von Abfragen und Senden auch außerhalb der Massentests testen, da somit eine gemischte Queue
von vergebenen und freien Domains zur Verfügung steht. Unser Abfrage-Thread sucht vor jedem
Sende-Intervall vier freie Domains aus der Liste, und zwar möglichst knapp vor dem Sendefenster,
um eine möglichst hohe Zuverlässigkeit des Ergebnisses zu gewährleisten. Fraglich bleibt natürlich,
wie aktuell die Antwort überhaupt ist.
</p>

<h3>Mittwoch, 21.10.2009, 15:00 Uhr</h3>
<p>
Startschuss Massentest, die Zweite. Wir sind jetzt gut aufgestellt.
Da für den Eingangszeitstempel der Abschluß des DATA-Commands entscheidend ist,
sollte der erste SMTP-Dialog vor der eigentlichen Startzeit begonnen werden.
Unklar ist allerdings, welcher Zeitpunkt zwischen dem Senden des abschließenden &quot;.&quot;
am Ende des DATA-Commands und der Antwort des Servers am Ende zählt, zumal diese
Zeitspanne ungewöhnlich groß ausfällt. Hier sind es ganze 6 Sekunden und wir
bekommen das abschließende &quot;250 OK&quot; gut 5 Sekunden nach 15:00 Uhr, das ist
vermutlich viel zu spät, um eine Chance auf die ersten Namen zu haben.
</p>

<h3>Mittwoch, 21.10.2009, 15:08 Uhr</h3>
<p>
Umso erstaunlicher, dass die ersten Mails mit &quot;Too early&quot; beantwortet werden,
das heißt, sie sind beim Server vor 15:00 Uhr angekommen?
</p>
<p>
Dann kommt noch ein &quot;Too early&quot; und noch eins. Entsprechende Proteste auf der
Mailingliste machen schnell klar: Der Test ist serverseitig fehlgeschlagen,
alles wird mit &quot;Too early&quot; beantwortet.
</p>

<h3>Mittwoch, 21.10.2009, 15:41 Uhr</h3>
<p>
DENIC entschuldigt sich: Der Startschuss war versehentlich in den November verlegt worden.
Ein erneuter Massentest wird für 16:00 Uhr angekündigt.
</p>

<h3>Mittwoch, 21.10.2009, 16:00 Uhr</h3>
<p>
Startschuss Massentest, die Dritte. Diesmal kommen zwar keine &quot;Too early&quot;-Antworten,
dafür werden aber fortwährend Verbindung abgwiesen:
</p>
<pre class="code">
421 Too many concurrent SMTP connections from this IP address; please try again later.
</pre>
<p>
Das ist merkwürdig, da wir definitiv für jede Mail eine eigene Connection öffnen
und schließen. Es dauert dann jeweils bis zu einer Sekunde, bis eine neue Verbindung
hergestellt werden kann. Es scheinen sich serverseitig verschiedene Dienste
zeitweilig uneins über das Bestehen einer Connection zu sein.
</p>
<p>
Fazit: das Verhalten des Servers ist ausgesprochen indeterministisch und man hat
den Eindruck, dass sich das Registrierungssystem durchaus in einer eher frühen
Entwicklungsphase befindet und die Tests nicht nur der Entwicklung der Clients,
sondern auch des Servers dienen.
</p>
<p>
Die wichtigste Beobachtung ist aber, dass wir immernoch zu viele Mails vergebens absenden,
da die Verfügbarkeitsprüfung während der Massentests erheblich langsamer voran
kommt als im Normalbetrieb. Unsere Prüfschleife sucht sich aus der Queue stets
die 4 nächsten freien Domains heraus, wofür im Normalbetrieb etwa 4 Sekunden, unter
Massentestbedingungen aber bis zu 20 Sekunden nötig sind.
</p>

<h3>Mittwoch, 21.10.2009, 19:06 Uhr</h3>
<p>
DENIC kündigt für den morgigen Tag nun doch noch zwei weitere Masentests an, um
9:00 Uhr und um 13:00 Uhr.
</p>
<p>
Dieser Abend wird genutzt, um die Strategie der Verfügbarkeitsabfrage zu verbessern.
Wir werden nicht mehr, wie bisher, ab einem bestimmten Zeitpunkt vor Beginn des
nächsten Sendefensters nach 4 freien Domains suchen, sondern den Thread praktisch
pausenlos immer wieder von vorne nach freien Domains suchen lassen. Damit werden
die als nächstes anstehenden Domains zwar immer wieder erneut geprüft, aber das
Prüfintervall bleibt dabei unabhängig von dem praktisch unkalkulierbaren
Sendefenster.
</p>
<p>
Die zweite Baustelle ist der exakte Startzeitpunkt. Eigentlich müsste man, um den
extrem zeitkritischen Start optimal abzupassen, den SMTP-Dialog direkt programmieren,
so dass man mit der ersten Mail hinreichend früh beginnt und das abschließende &quot;.&quot;
am Ende des DATA-Commands bis zur exakten Startzeit hinauszögern kann. Dafür ist aber
die Zeit zu knapp, und die von uns benutzte SMTP-Library unterstützt natürlich
nicht ein so spezielles Timing.
</p>
<p>
Der Arbeitstag dauert heute bis 23:00 Uhr und wird morgen um 8:00 Uhr beginnen.
</p>

<h3>Donnerstag, 22.10.2009, 08:00 Uhr</h3>
<p>
Vorbereitung auf den heutigen ersten Massentest. Auf der Mitgliederliste werden
weiterhin Detailfragen des ACL diskutiert und von einigen eine Verschiebung
des Registrierungstermins gefordert.
</p>

<h3>Donnerstag, 22.10.2009, ab 09:00 Uhr</h3>
<p>
Der Massentest startet. Der SMTP-Server scheint nun noch langsamer als gestern
zu reagieren. Außerdem wird ungefähr jede 3. Mail mit &quot;Too many concurrent SMTP connections&quot;
abgelehnt. Beim Beobachten des Logfiles ist man im Grunde froh, überhaupt einige
Mails einliefern zu können.
</p>

<h3>Donnerstag, 22.10.2009, 09:56 Uhr</h3>
<p>
DENIC räumt ein, dass das Problem der abgwiesenen Connections reproduzierbar ist:
</p>
<p class="quote">
&quot;[...] Unsere Tests haben ergeben, dass der entsprechende Prozess zum Beenden der
Connection unter Last länger dauert als gewohnt. Nach Aussen wird die Connection
zwar als geschlossen bestätigt - der Mailserver ist aber tatsächlich noch beim
schliessen. [...]&quot; [sic]
</p>
<p>
Man soll es eben dann immer wieder neu versuchen, bis man wieder eine
Connection bekommt.
</p>

<h3>Donnerstag, 22.10.2009, 10:10 Uhr</h3>
<p>
DENIC gibt Messergebnisse zum Massentest bekannt. Es wurden von insgesamt 156
verschiedenen IP-Adressen aus pro Minute bis zu 3422 Versuche zur Mailannahme
registriert. Wenn von jeder IP nur 4 Mails pro Minute eingeliefert würden,
käme man nur auf 624.
</p>
<p>
Natürlich versucht wohl niemand mehr, nur noch 4 Mails pro Minute abzusetzen,
nachdem man gemerkt hat, dass einerseits außerhalb der Masentests teilweise durchaus
auch mal 5 oder 6 Mails in einer Minute angenommen werden und andererseits
das Einliefern einer einzigen Mail während der Massentests bis zu 20 Sekunden
dauern kann.
</p>
<p>
In Anbetracht dessen, dass der für 13:00 Uhr angekündigte Test voraussichtlich der
letzte sein wird, lässt die Ankündigung, das Serversystem &quot;weiter zu optimieren&quot;
auf nichts Gutes hoffen. Man wird sich womöglich erneut auf ein verändertes Verhalten
des Servers einstellen müssen, ohne weitere Gelegenheit zum Testen zu haben. Um 15:00 Uhr
soll das Testsystem dann ganz abgeschaltet werden. Aber niemand will Änderungen implementieren,
die man nicht mehr testen kann.
</p>

<h3>Donnerstag, 22.10.2009, 12:51 Uhr</h3>
<p>
DENIC lässt ein neues Statement zum 4-Mails/Minute-ACL verlauten:
</p>
<p class="quote">
&quot;[...] hochgezählt wird zu dem Zeitpunkt, wenn der Envelope der
Mail geschrieben wurde.&quot;
</p>
<p>
Leider können ja offenbar mehrere Sekunden zwischen Abschluss
des DATA-Commands und der folgenden Bestätigung &quot;250 OK&quot; vergehen. Wann innerhalb dieser
Zeitspanne nun der Envelope geschrieben wird, ist für den Client nicht feststellbar.
Ein exaktes Timing bzgl. des ACL ist somit schlichtweg unmöglich. Man muss
hintereinander versuchen, Mails loszuwerden, egal ob zwischendurch Connections
abgewiesen werden oder die Einlieferung per ACL abgelehnt wird, anders geht es nicht.
</p>

<h3>Donnerstag, 22.10.2009, 13:00 Uhr</h3>
<p>
Der letzte Massentest startet. Wir haben unsere Startzeit diesmal um 2 Sekunden
zurückgesetzt, um den richtigen Startzeitpunkt wenigstens ungefähr auszuloten.
Sollte der erste Antrag mit &quot;Too early&quot; beantwortet werden, so werden wir auf
den vorherigen Wert zurückschwenken.
</p>
<p>
Es wird sofort klar: Diesmal kommt überhaupt keine Connection zustande. Der
Server meldet kein Banner, offenbar werden alle Connections bereits
firewallseitig abgelehnt.
</p>

<h3>Donnerstag, 22.10.2009, 13:06 Uhr</h3>
<p>
Der Abbruch des Massentests wegen Netzwerkproblemen wird bekanntgegeben. Ein
neuer Test wird für 14:00 Uhr anberaumt.
</p>

<h3>Donnerstag, 22.10.2009, 13:48 Uhr</h3>
<p>
Der für 14:00 Uhr geplante Massentest wird wegen der noch bestehenden Netzwerkproblematik
abgesagt.
</p>

<h3>Donnerstag, 22.10.2009, 14:42 Uhr</h3>
<p>
DENIC meldet, dass ein Versuch, das Problem der abgelehnten Connections
zu umgehen, gescheitert ist. Es wird auf die bei den vorangegangen Tests
benutzte Konfiguration zurückgegriffen.
Nun sollen noch zwei Massentests um 15:30 Uhr und um 16:30 Uhr stattfinden.
</p>

<h3>Donnerstag, 22.10.2009, 15:26 Uhr</h3>
<p>
DENIC gibt bekannt, dass der Test um 15:30 Uhr nicht starten kann, weil
zusätzlich Zeit für das Zurückgehen auf die frühere Lösung benötigt wird.
Über den neuen Zeitplan soll zeitnah informiert werden.
</p>

<h3>Donnerstag, 22.10.2009, 15:55 Uhr</h3>
<p>
DENIC gibt den weiteren Zeitplan bekannt: Neue Massentests um 16:30 Uhr und um 17:00 Uhr.
Abschaltung der Testumgebung um 17:30 Uhr.
</p>

<h3>Donnerstag, 22.10.2009, 16:33 Uhr</h3>
<p>
Der nächste Versuch ist gestartet. Das Einliefern der Mails funktioniert jetzt
wieder, dauert aber teilweise über 30 Sekunden.
Wir brechen nach kurzer Zeit ab, um unsere Testumgebung für den letzten Test um
17:00 Uhr rechtzeitig zurücksetzen zu können.
</p>

<h3>Donnerstag, 22.10.2009, 17:00 Uhr</h3>
<p>
Der letzte Test ist gestartet. Wir haben nun den Startzeitpunkt so weit zurückgesetzt,
dass mehrere Anträge mit &quot;Too early&quot; beantwortet werden, woraus sich nun ungefähr der
bestmögliche Startzeitpunkt erraten lassen sollte.
</p>

<h3>Freitag, 23.10.2009, 08:00 Uhr</h3>
<p>
Noch eine Stunde bis zum Start. Unser Tool wird auf Livebetrieb umgestellt.
Es kommt jetzt nur noch darauf an, vor Aufregung ja keine Einstellung
zu übersehen. Es muss die richtige Startzeit eingestellt sein,
die richtige Datenbank geladen und der richtige Server angesprochen werden.
</p>
<p>
Mit Hilfe eines vorbereiteten Tools werden für die anstehenden Aufträge die PGP-signierten
Auftragsmails generiert, die später abgesendet werden müssen.
</p>

<h3>Freitag, 23.10.2009, 08:30 Uhr</h3>
<p>
DENIC meldet, dass die Domains e.de, f.de, g.de, x.de, y.de, z.de, br.de, dw.de, hr.de und sr.de
aufgrund kurzfristig zugestellter einstweiliger Verfügungen von der Registrierung
ausgenommen sind.
Gerade noch früh genug, um diese Aufträge in der Datenbank als gesperrt zu markieren.
</p>

<h3>Freitag, 23.10.2009, 08:38 Uhr</h3>
<p>
DENIC gibt bekannt, dass nicht nur vw.de, sondern auch o2.de aufgrund von
Gerichtsentscheidungen bereits registriert wurden.
</p>

<h3>Freitag, 23.10.2009, 08:55 Uhr</h3>
<p>
Letzte Prüfung der Checkliste. Die Konfiguration ist auf den Livebetrieb umgestellt.
Die automatische Synchronisierung mit fremden Zeitgebern ist angehalten, damit die
Systemzeit ausschließlich durch den NTP-Server der DENIC synchronisiert wird.
Der Domain-Robot, sowie alle Dienste, die gewöhnlich RRI-Connections mit DENIC
aufbauen, sind angehalten, denn nur nur eine einzige RRI-Connection ist zugelassen.
</p>

<h3>Freitag, 23.10.2009, 08:57 Uhr</h3>
<p>
Unser Tool wird gestartet. Es lädt sich den Inhalt der Datenbanktabelle in den RAM,
baut die TCP-Connection zum Mailserver auf und wartet von da an auf den Startzeitpunkt.
</p>
<p>
Das Rauchverbot im Entwicklerbüro wird vorübergehend aufgehoben.
</p>

<h3>Freitag, 23.10.2009, 08:59 Uhr</h3>
<p>
Gespannt wird der Beginn des ersten SMTP-Dialogs auf der Konsole mitgelesen. Die
Einleitung des DATA-Blocks erfolgt eine halbe Sekunde vor 9:00 Uhr:
</p>
<pre class="code">
2009-10-23 08:59:59,449 &gt;&gt; RCPT TO:&lt;auto@nd.denic.de&gt;
2009-10-23 08:59:59,459 &lt;&lt; 250 Accepted
2009-10-23 08:59:59,479 &gt;&gt; DATA
2009-10-23 08:59:59,520 &lt;&lt; 354 Enter message, ending with &quot;.&quot; on a line by itself
</pre>

<h3>Freitag, 23.10.2009, 09:00 Uhr</h3>
<p>
Es dauert 6 Sekunden bis zur Bestätigung des ersten DATA-Blocks. Mit der ersten
Domain werden wir somit wohl kaum durchkommen, aber wären wir früher gestartet, hätten
wir genausogut leicht in das &quot;Too early&quot; laufen können. Ein Glücksspiel, wie
so oft im Leben dann leider doch mal wieder ohne Hauptgewinn.
</p>
<p>
Alles Weitere ist ein Automatismus. Zäh wird Auftrag um Auftrag abgesetzt, während
30 Sekunden nach 9 Uhr planmäßig der zweite Thread startet, um parallel die
nächsten Domains auf Verfügbarkeit hin zu prüfen.
</p>
<p>
Die Unwägbarkeiten bei diesem Rennen sind groß. Beispielsweise bleibt unser Antrag
auf die Domain 1.de, den wir um 09:00:42 absetzen, erfolglos, obgleich um 09:00:38
noch die Verfügbarkeit dieser Domain bestätigt wurde. Wieviel Zeit zwischen dem
Entgegennehmen eines Auftrags und der eigentlichen Registrierung einer Domain mitsamt
Anzeige im RRI auf DENIC-Seite vergeht, weiß niemand.
</p>

<h3>Freitag, 23.10.2009, 09:02 Uhr</h3>
<p>
Wie man später erfahren wird, sind zu diesem Zeitpunkt bereits 85 Domains registriert,
d.h. das eigentliche Rennen um die wichtigen Namen ist so gut wie gelaufen. Dennoch
werden noch alle Domains als verfügbar angezeigt. Erstmals um 09:02:34 wird eine
Domain mit Status &quot;connect&quot; angegeben.
</p>
<p>
Das Rennen um die hinteren Plätze wird noch den ganzen Tag über andauern.
Unter dem Strich sind wir zu 8% mit unseren Aufträgen erfolgreich. Das ist keine so
schlechte Quote. Andere Mitglieder waren dadurch erfolgreicher, dass sie sich zu
Pools zusammengeschlossen und ihre Listen untereinander optimiert haben. Davon
abgesehen waren die Bedingungen für alle Mitglieder im Wesentlichen gleich schwierig.
</p>
<p>
Die Gewinner der ganzen Veranstaltung sind ohnehin Sedo und vor allem die Rechtsanwälte,
die nun in ihr eigenes Rennen einsteigen können. Noch im Laufe des Tages wird z.B.
bekannt, dass in einem kleinen Haus mit vielen Briefkästen irgendwo in Florida
plötzlich erstaunlich viele Firmen mit ein- und zweistelligen Initialen gegründet wurden.
Und andere Geschichten.
</p>
<p>
Unser Rennen haben wir gewonnen, denn wir haben aus den Gegebenheiten
das Bestmögliche gemacht. Alles Weitere wäre Glück gewesen.
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Kalkulator kolorów DHTML</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/10/kalkulator-kolorow-dhtml.html" />
    <id>tag:www.autobugfix.com,2009://3.10</id>

    <published>2009-10-17T13:44:37Z</published>
    <updated>2013-05-01T20:43:09Z</updated>

    <summary>wow, the one and only legendary good old DHTML Color Calculator has been translated into Polish :-)</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="META" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="etc." scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="javascript" label="JavaScript" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
wow, the <a href="http://dev.ipv1.net/drafts/coca/" target="_blank">one and only legendary good old DHTML Color Calculator</a> has been translated into Polish :-)
</p>
]]>
        <![CDATA[<p><a href="http://dev.ipv1.net/drafts/coca/" target="_blank"><img alt="kalkulator.jpg" src="http://www.autobugfix.com/2009/10/17/kalkulator.jpg" width="582" height="509" style="margin:10px"></a></p>

<p>
Actually I&#8217;m not able to understand <a href="http://www.webmaster.twoja.org/index.php?option=com_content&amp;view=article&amp;id=241&amp;Itemid=117" target="_blank">the Pomocnic webmastera site</a>, but it&#8217;s a great honor anyway.
</p>

<p>
Well, indeed I&#8217;ve to set up a new and more contemporary online color calculator soon. This one was my first web try out then in 2001, compatible with Netscape 4 and IE5, lol.
</p>
]]>
    </content>
</entry>

<entry>
    <title>shorten by regex</title>
    <link rel="alternate" type="text/html" href="http://www.autobugfix.com/2009/10/shorten-by-regex.html" />
    <id>tag:www.autobugfix.com,2009://3.11</id>

    <published>2009-10-15T20:35:50Z</published>
    <updated>2013-04-02T10:15:05Z</updated>

    <summary>While customizing my mt blog, I was wondering how to abbreviate long entry titles on particular places in a nice way. Well, mt provides template tag modifiers such as trim-to, but my aim was to do it  more nicely, i.e. replacing anything behind the first three (or any other maximum) words by an ellipsis.</summary>
    <author>
        <name>admin</name>
        <uri>http://www.autobugfix.com</uri>
    </author>
    
        <category term="MT hacks" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="mt" label="mt" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="programming" label="programming" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="regex" label="RegEx" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.autobugfix.com/">
        <![CDATA[<p>
While customizing my <a href="http://beta-blog.net">mt blog</a>, I was wondering how to abbreviate long entry titles on particular places in a nice way. Well, mt provides template tag modifiers such as <a rel="nofollow" target="_blank" href="http://www.movabletype.org/documentation/appendices/modifiers/trim-to.html">trim-to</a>, but my aim was to do it  more nicely, i.e. replacing anything behind the first three (or any other maximum) words by an ellipsis. For instance,
<collapsible>
<pre class="code">
&quot;more than three words&quot;
</pre>
</collapsible>
should be shortened to 
<collapsible>
<pre class="code">
&quot;more than three ...&quot;
</pre>
</collapsible>
while a three-word-sentence should remain as it is.
</p>

<p>
The solution is a simple regex, of course. In Perl style,
<collapsible>
<pre class="code">
s/&circ;(\S+(?:\s+\S+){2})(\s+\S+)+/$1 .../
</pre>
</collapsible>
does the job, because it doesn't match on three words or less. Equivalently, within mt template tags:
<collapsible>
<pre class="code">
&lt;$mt:EntryTitle regex_replace="/&circ;(\S+(?:\s+\S+){2})(\s+\S+)+/","$1&amp;ensp;&amp;hellip;"$&gt;
</pre>
</collapsible>
</p>]]>
        
    </content>
</entry>

</feed>
