<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>parser &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://wordpress.com/tag/parser/</link>
	<description>Feed of posts on WordPress.com tagged "parser"</description>
	<pubDate>Fri, 08 Aug 2008 00:13:46 +0000</pubDate>

	<generator>http://wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Membuat Parser LISP (delphi)]]></title>
<link>http://pebbie.wordpress.com/?p=144</link>
<pubDate>Sun, 03 Aug 2008 01:44:56 +0000</pubDate>
<dc:creator>pebbie</dc:creator>
<guid>http://pebbie.wordpress.com/?p=144</guid>
<description><![CDATA[Beberapa hari yang lalu saya membuat parser untuk LISP. Lebih tepatnya LISP-like syntax karena saya ]]></description>
<content:encoded><![CDATA[<p>Beberapa hari yang lalu saya membuat parser untuk LISP. Lebih tepatnya <em>LISP-like syntax</em> karena saya membuatnya untuk melakukan parsing terhadap CLIPS (C Language Integrated Production System) yaitu shell untuk sistem berbasis pengetahuan (sistem pakar). Parser ini digunakan untuk membuat antarmuka Desktop(GUI) untuk sistem pakar yang dibuat dengan CLIPS. </p>
<p>Kodenya dibuat masih belum dioptimasi, struktur proses yang direalisasi dibuat menjadi dua kelas yaitu LISPLexer dan LISPParser sedangkan struktur yang menampung representasi list-nya sendiri menjadi satu kelas LISPNode. OK, let's jump to the code. kita mulai dari LISPNode.<br />
<!--more--></p>
<p><strong>TLISPNode</strong><br />
List di LISP sebetulnya dinotasikan sebagai <em>S-Expression</em>. jadi List seperti ini</p>
<pre><code>(a b)
(a b c)
(a (b))
</code></pre>
<p>sebetulnya dalam notasi S-expression akan menjadi seperti ini</p>
<pre><code>(a.(b.NIL))
(a.(b.(c.NIL)))
(a.((b.NIL).NIL))
</code></pre>
<p>setiap pasangan (a.t) merupakan sebuah simpul dengan a adalah <em>head</em> dan t adalah <em>tail</em>. Setiap simpul dapat merupakan atom/nilai atomik (number, symbol, string) atau list (pointer ke s-expression lainnya, seperti pada baris ke-tiga). Berikut ini deklarasi kelas TLISPNode.</p>
<pre><code><strong>type</strong>
  TLISPNodeType = (ntAtom, ntList);
  TLISPAtomType = (atSymbol, atString, atNumber);
  TLISPNode = <strong>class</strong>
  <strong>protected</strong>
    vtype     : TLISPNodeType;
    Fatomtype  : TLISPAtomType;
    atomval   : string;
    Fchild : TLISPNode;
    Fnext : TLISPNode;
  <strong>public</strong>
    <strong>constructor</strong> Create;
    <strong>destructor</strong> Free;
    <strong>function</strong> ToString:string;
    <strong>function</strong> Clone : TLISPNode;
    <strong>function</strong> GetHead : TLISPNode;
    <strong>function</strong> GetTail : TLISPNode;
    <strong>property</strong> Child : TLispNode <strong>read</strong> FChild <strong>write</strong> FChild;
    <strong>property</strong> Next : TLispNode <strong>read</strong> FNext <strong>write</strong> FNext;
    <strong>property</strong> Value : string <strong>read</strong> atomval <strong>write</strong> atomval;
    <strong>property</strong> NodeType : TLISPNodeType <strong>read</strong> vtype <strong>write</strong> vtype;
    <strong>property</strong> AtomType : TLISPAtomType <strong>read</strong> FAtomType <strong>write</strong> FAtomType;
  <strong>end</strong>;</code></pre>
<p>Keterkaitan antar simpul dibuat sebagai list satu arah yang menunjuk ke sibling dan child (jika simpul tersebut bukan atom).</p>
<p><strong>TLISPLexer</strong><br />
kelas LISPLexer berfungsi untuk melakukan pemisahan string menjadi bagian-bagian yang diberi label (token). Pemisahan dan pemberian label ini disebut juga sebagai analisis leksikal. Hal yang dilakukan sebetulnya dapat dibilang <em>parsing</em>, namun tujuannya lebih sederhana (tanpa pohon sintaks yang rumit). Kode interface bagian Lexer ditunjukkan dalam potongan kode berikut.</p>
<pre><code><strong>const</strong>
  HTAB = #9;
  CRLF = #10#13;
  LBRA = '(';
  RBRA = ')';
  DQUO = '"';
  S_ALPHA = ['a'..'z','A'..'Z'];
  S_DIGIT = ['0'..'9'];
  S_ALNUM = S_ALPHA+S_DIGIT;
  S_WHITE = [' ',HTAB,#10,#13];

<strong>type</strong>
  TTokenType = (ttString, ttSymbol, ttNumber, ttLBRACKET, ttRBRACKET);
  TToken = <strong>record</strong>
    tokentype : TTokenType;
    sym : string;
  <strong>end</strong>;

  TLispLexer = <strong>class
  private</strong>
    <strong>function</strong> GetCount: integer;
  <strong>protected</strong>
    FToken : <strong>array of</strong> TToken;
    FPita : string;
    FIdx : integer;
    <strong>function</strong> GetToken(tid : integer):TToken;
    <strong>procedure</strong> Proses;
    <strong>procedure</strong> skipwhite;
    <strong>function</strong> iswhite(c:char):boolean;
    <strong>function</strong> isalpha(c:char):boolean;
    <strong>function</strong> isalnum(c:char):boolean;
    <strong>function</strong> isdigit(c:char):boolean;
    <strong>procedure</strong> SetPita(const Value: string);
    <strong>procedure</strong> add_token(toktype: TTokenType; tokval:string);
  <strong>public
    constructor</strong> Create;<strong>overload</strong>;
    <strong>constructor</strong> Create(inputstring: string);<strong>overload</strong>;
    <strong>destructor</strong> Free;
    <strong>procedure</strong> Clear;
    <strong>function</strong> dumptoken(t:ttoken):string;
    <strong>function</strong> dump:string;
    <strong>property</strong> Token[tid : integer]: TToken <strong>read</strong> GetToken;
    <strong>property</strong> Pita: string <strong>read</strong> FPita <strong>write</strong> SetPita;
    <strong>property</strong> Count : integer <strong>read</strong> GetCount;
  <strong>end</strong>;
</code></pre>
<p>Tipe token untuk LISP untungnya cukup sederhana, karena pemisah antar 'kata'-nya hanya <strong>whitespace</strong> dan tanda kurung '(' dan ')'. Bagian yang perlu diperlihatkan di sini adalah di prosedur proses.</p>
<pre><code><strong>procedure</strong> TLispLexer.Proses;
<strong>var</strong>
  cur_kata : string;
  cc : char;
  cur_token : TTokenType;
<strong>begin</strong>
  <strong>if</strong> length(FPita)=0 <strong>then exit</strong>;
  FIdx := 1;
  cur_kata := '';
  cur_token := ttLBRACKET;
  skipwhite;
  <strong>while</strong> (FIdx &#60;= length(FPita)) <strong>do begin</strong>
    cc := FPita[FIdx];
    <strong>case</strong> cc <strong>of</strong>
      LBRA:add_token(ttLBRACKET, cc);
      RBRA:add_token(ttRBRACKET, cc);
      DQUO:<strong>begin</strong>
        inc(FIdx);
        cur_kata := '';
        <strong>while</strong> (FIdx &#60;= length(FPita)) <strong>and</strong> (FPita[FIdx] &#60;&#62; DQUO) <strong>do begin</strong>
          cur_kata := cur_kata + FPita[FIdx];
          inc(FIdx);
        <strong>end</strong>;
        add_token(ttString, cur_kata);
      <strong>end</strong>;
      'a'..'z','A'..'Z':<strong>begin</strong>
        cur_kata := '';
        <strong>while</strong> (FIdx &#60;= length(FPita)) and (FPita[FIdx] in S_ALNUM+['-']) <strong>do begin</strong>
          cur_kata := cur_kata + FPita[FIdx];
          inc(FIdx);
        <strong>end</strong>;
        dec(FIdx);
        add_token(ttSymbol, cur_kata);
      <strong>end</strong>;
      '0'..'9','-':<strong>begin</strong>
        cur_kata := '';
        <strong>while</strong> (FIdx &#60;= length(FPita)) <strong>and</strong> (FPita[FIdx] in S_DIGIT+['-','e','E','.']) <strong>do begin</strong>
          cur_kata := cur_kata + FPita[FIdx];
          inc(FIdx);
        <strong>end</strong>;
        dec(FIdx);
        add_token(ttNumber, cur_kata);
      <strong>end</strong>;
    <strong>end</strong>;
    inc(FIdx);
    skipwhite;
  <strong>end</strong>;
<strong>end</strong>;
</code></pre>
<p><strong>TLISPParser</strong><br />
Setelah string karakter dipisahkan dan diberi label menjadi list token, maka proses selanjutnya adalah menterjemahkan deretan token ini menjadi struktur list yang sesuai. Berikut ini deklarasi kelas LISPParser. Method utama yang diperlukan adalah fungsi <strong>ParseNode</strong> yang diantar oleh fungsi umum <strong>Parse</strong>. </p>
<pre><code><strong>type</strong>
  TLISPParser = <strong>class
  protected</strong>
    FLexer : TLISPLexer;
    FTokenPtr : integer;
    <strong>function</strong> GetToken:TToken;
    <strong>function</strong> ParseNode:TLISPNode;
    <strong>procedure</strong> NextToken;
  <strong>public
    function</strong> Parse(LISPString:string):TLISPNode;
  <strong>end</strong>;
</code></pre>
<p>Fungsi Parse merupakan antarmuka yang melakukan persiapan dan memanggil proses analisis leksikal sebelum diserahkan ke fungsi ParseNode yang berjalan secara rekursif.</p>
<pre><code><strong>function</strong> TLISPParser.Parse(LISPString:string): TLISPNode;
<strong>begin</strong>
  result := nil;
  FLexer := LISP_Lexer;
  FLexer.Clear;
  FLexer.SetPita(LISPString);
  FTokenPtr := 0;
  <strong>if</strong> (FLexer.Count &#60;&#62; 0) <strong>and</strong> (GetToken.tokentype=ttLBRACKET) <strong>then begin</strong>
    NextToken;
    result := ParseNode;
  <strong>end</strong>;
<strong>end</strong>;

<strong>function</strong> TLISPParser.ParseNode: TLISPNode;
<strong>var</strong>
  tmp : TLISPNode;
  t : TToken;
<strong>begin</strong>
  result := nil;
  <em>{ proses car }</em>
  t := GetToken;
  <strong>case</strong> t.tokentype <strong>of</strong>
    <em>{ this is an empty list }</em>
    ttRBRACKET:exit;

    <em>{ car is an atom }</em>
    ttSYMBOL, ttSTRING, ttNumber:<strong>begin</strong>
      <em>{ check if nil }</em>
      <strong>if</strong> t.sym = '' <strong>then </strong>exit;
      <em>{ normal atom value }</em>
      result := TLISPNode.Create;
      result.NodeType := ntAtom;
      <strong>case</strong> t.tokentype <strong>of</strong>
        ttSYMBOL:result.atomtype := atSymbol;
        ttString:result.atomtype := atString;
        ttNumber:result.atomtype := atNumber;
      <strong>end</strong>;
      result.Value := t.sym;
    <strong>end</strong>;

    <em>{ car is a list }</em>
    ttLBRACKET:<strong>begin</strong>
      result := TLISPNode.Create;
      result.NodeType := ntList;
      NextToken;
      <em>{ parse child }</em>
      tmp := ParseNode;
      <em>{ configure node }</em>
      result.Child := tmp;
    <strong>end</strong>;

  <strong>end</strong>; <em>{ case..of }</em>

  <em>{ proses next }</em>
  NextToken;
  <strong>if</strong> FTokenPtr &#60; FLexer.Count <strong>then begin</strong>
    t := GetToken;
    tmp := ParseNode;
    result.Next := tmp;
  <strong>end</strong>;
<strong>end</strong>;
</code></pre>
<p>Untuk yang penasaran dengan keseluruhan kodenya bisa diunduh di <a href="http://pebbie.net/lispparser.pas">sini</a>.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Простой рекурсивный парсер JSON]]></title>
<link>http://vlasovskikh.wordpress.com/?p=18</link>
<pubDate>Thu, 24 Jul 2008 01:28:09 +0000</pubDate>
<dc:creator>vlasovskikh</dc:creator>
<guid>http://vlasovskikh.wordpress.com/?p=18</guid>
<description><![CDATA[Недавно наткнулся на интересный метод парсинга, котор]]></description>
<content:encoded><![CDATA[<p>Недавно наткнулся на интересный метод парсинга, который в <a href='5-8459-0189-8'>«Книге дракона»</a> если и описан, то только в виде упоминания. Метод называется рекурсивный спуск с использованием перебора с возвратом (recursive descent with backtracking). Встретил я пример использования этого метода в <a href='http://funprog-ru.googlecode.com/files/intro2fp-ru-4.pdf'>русском переводе «Введения в функциональное программирование»</a> Джона Харрисона (John Harrison). Парсеры такого типа можно свободно писать с нуля на функциональных языках.</p>
<p>Если есть <abbr title='Backus-Naur Form'>BNF</abbr>-подобное описание грамматики без левой рекурсии, то по её нетерминалам легко пишутся функции их разбора. Каждая такая функция является парсером и имеет тип (здесь и далее используется язык Python) <code>Sequence(a) -&#62; (b, Sequence(a))</code>, где <code>a</code> — тип токенов (например, банальных <code>str</code>), <code>b</code> — тип результата разбора (в общем случае <code>object</code>), а <code>Sequence</code> — тип любой последовательности (вот <a href='http://www.python.org/dev/peps/pep-3119/#sequences'>один из вариантов</a> того, что можно считать последовательностью). Для таких функций-парсеров можно ввести очень удобные комбинаторы типа альтернативы, следования, множественного применения. Вообще, это очень показательный пример преимуществ функционального программирования.</p>
<p>Используя этот метод, я написал парсер JSON на Python на основе грамматики из <a href='http://tools.ietf.org/html/rfc4627'>RFC 4627</a>. Известны <a href='http://deron.meranda.us/python/comparing_json_modules/'>проблемы парсеров JSON на Python</a>, но я строго следовал грамматике RFC, так что в лексике и синтаксисе ошибок быть не должно. Приведу исходный код. Вначале часть, которую можно было бы сделать <em>библиотекой</em>: несколько удобных комбинаторов, взятых из книги Харрисона, а также объектная обёртка комбинаторов для перегрузки операторов Python ради красоты:</p>
<p><!--more--></p>
<pre><code>class NoParse(Exception):
    def __init__(self, tokens, msg=None):
        self.tokens = tokens
        self.msg = msg

    def __str__(self):
        s = 'cannot parse input'
        e = 'unparsed data remains:\n%s' % self.tokens
        msg = ' %s' % self.msg if self.msg is not None else ''
        return '%s:%s\n%s' % (s, msg, e)

class Parser(object):
    def __init__(self, p):
        self.wrapped = p

    def __call__(self, tokens):
        return self.wrapped(tokens)

    def __add__(self, other):
        @Parser
        def f(tokens):
            v1, r1 = self(tokens)
            v2, r2 = other(r1)
            return (v1, v2), r2
        return f

    def __or__(self, other):
        @Parser
        def f(tokens):
            try:
                return self(tokens)
            except NoParse:
                return other(tokens)
        return f

    def __rshift__(self, treatment):
        @Parser
        def f(tokens):
            v, r = self(tokens)
            return treatment(v), r
        return f

@Parser
def finished(tokens):
    if len(tokens) == 0:
        return None, tokens
    else:
        raise NoParse(tokens, 'should have reached eof')

def many(p):
    @Parser
    def f(tokens):
        try:
            v, next = p(tokens)
            vs, rest = many(p)(next)
            return [v] + vs, rest
        except NoParse:
            return [], tokens
    return f

def some(predicate):
    @Parser
    def f(tokens):
        if len(tokens) == 0:
            raise NoParse(tokens, 'no tokens left in the stream')
        else:
            t, ts = tokens[0], tokens[1:]
            if predicate(t):
                return t, ts
            else:
                raise NoParse(tokens, 'token "%s" doesn\'t match against the predicate' % t)
    return f

def a(value):
    return some(lambda t: t == value)

def several(predicate):
    return many(some(predicate))</code></pre>
<p>А теперь код самого парсера JSON. Я не хитрил и реализовал разбор «в лоб», просто следуя RFC. Как я уже <a href='http://vlasovskikh.jaiku.com/presence/39297408'>писал</a>, код всё равно получается прозрачный и краткий:</p>
<pre><code>def json_loads(s):
    'str -&#62; object'
    head = lambda (x, xs): x
    from operator import add
    word = lambda w: reduce(add, [a(c) for c in w])

    # RFC 4627 productions, nearly 1-to-1
    ws = several(lambda c: c in ' \t\n\r')
    begin_array = ws + a('[') + ws
    end_array = ws + a(']') + ws
    begin_object = ws + a('{') + ws
    end_object = ws + a('}') + ws
    name_separator = ws + a(':') + ws
    value_separator = ws + a(',') + ws
    false = (word('false')
        &#62;&#62; (lambda _: False))
    true = (word('true')
        &#62;&#62; (lambda _: True))
    null = (word('null')
        &#62;&#62; (lambda _: None))
    unescaped = some(lambda c: any(min &#60;= ord(c) &#60;= max
        for min, max in [(0x20, 0x21), (0x23, 0x5b), (0x5d, 0x10ffff)]))
    escape = a('\\')
    quotation_mark = a('"')
    hexdig = some(lambda c: c in '0123456789ABCDEF')
    char = (
          unescaped
        &#124; ((escape + (
              a('"')
            &#124; a('\\')
            &#124; a('/')
            &#124; a(unichr(0x0008)) # backspace
            &#124; a(unichr(0x000c)) # form feed
            &#124; a('\n')
            &#124; a('\r')
            &#124; a('\t')
            &#124; (a('u') + hexdig + hexdig + hexdig + hexdig # uXXXX
                &#62;&#62; (lambda ((((_, d1), d2), d3), d4):
                   (unichr(int('%s%s%s%s' % (d1, d2, d3, d4), 16)))))
        )) &#62;&#62; (lambda (_, v): v))
    )
    string = (quotation_mark + many(char) + quotation_mark
        &#62;&#62; (lambda ((_1, cs), _2): ''.join(cs)))
    minus = a('-')
    plus = a('+')
    e = a('e') &#124; a('E')
    decimal_point = a('.')
    digit1_9 = some(lambda c: c in '123456789')
    zero = a('0')
    digit = digit1_9 &#124; zero
    integer = (
          zero
        &#124; (digit1_9 + many(digit)
            &#62;&#62; (lambda (x, xs): ''.join([x] + xs))))
    frac = (decimal_point + digit + many(digit)
        &#62;&#62; (lambda ((_, x), xs): ''.join([x] + xs)))
    exp = (
          (e + minus + digit + many(digit)
            &#62;&#62; (lambda (((_1, _2), x), xs): '-%s' % ''.join([x] + xs)))
        &#124; (e + plus + digit + many(digit)
            &#62;&#62; (lambda (((_1, _2), x), xs): ''.join([x] + xs)))
    )
    number = (
          (minus + integer + frac + exp
            &#62;&#62; (lambda (((_, x), f), e): float('-%s.%se%s' % (x, f, e))))
        &#124; (minus + integer + frac
            &#62;&#62; (lambda ((_, x), f): float('-%s.%s' % (x, f))))
        &#124; (minus + integer + exp
            &#62;&#62; (lambda ((_, x), e): float('-%se%s' % (x, e))))
        &#124; (minus + integer
            &#62;&#62; (lambda (_, x): int('-%s' % x)))
        &#124; (integer + frac + exp
            &#62;&#62; (lambda ((x, f), e): float('%s.%se%s' % (x, f, e))))
        &#124; (integer + frac
            &#62;&#62; (lambda (x, f): float('%s.%s' % (x, f))))
        &#124; integer
            &#62;&#62; (lambda x: int(x))
    )
    @Parser
    def value(tokens):
        return (
            false &#124; null &#124; true &#124; object &#124; array &#124; number &#124; string
        )(tokens)
    array = (
          (begin_array + value + many(value_separator + value) + end_array
            &#62;&#62; (lambda (((_1, x), vs), _2):
               ([x] + [v for _, v in vs])))
        &#124; (begin_array + end_array
            &#62;&#62; (lambda _: []))
    )
    member = (string + name_separator + value
        &#62;&#62; (lambda ((k, _), v): (k, v)))
    object = (
          (begin_object + member + many(value_separator + member) + end_object
            &#62;&#62; (lambda (((_1, x), ms), _2):
               (dict([x] + [m for _, m in ms]))))
        &#124; (begin_object + end_object
            &#62;&#62; (lambda _: {}))
    )
    json_text = (object &#124; array) + finished &#62;&#62; head
    return head(json_text(s))</code></pre>
<p>Имея готовую библиотечную часть парсера и грамматику языка BNF, можно довольно легко писать парсеры для разных языков.</p>
<p>Несколько замечаний об использовании:</p>
<ul>
<li>В грамматике RFC 4627 лексемы отдельно не описываются, так что я лексический анализ как стадию не выделил. Но вообще парсить поток токенов с их типом, значением и позицией в потоке удобнее</li>
<li>Если в грамматике есть левая рекурсия, то её нужно устранить, т. к. это <abbr title='Left to right, Leftmost derivation'>LL</abbr>-парсер</li>
<li>За счёт возврата при переборе парсер может заглядывать на сколько угодно токенов вперёд, то есть это LL(*)</li>
<li>При использовании в качестве входа функций разбора обычных строк без состояния нельзя установить место, в котором произошла ошибка. К тому же весь разбираемый текст надо держать в памяти. Введение состояния портит красоту и немного усложняет код. Мне кажется, таких проблем можно избежать в <a href='http://www.cs.nott.ac.uk/~gmh/pearl.pdf'>монадических парсерах</a></li>
</ul>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Segway]]></title>
<link>http://detzi.wordpress.com/?p=108</link>
<pubDate>Sun, 06 Jul 2008 18:51:31 +0000</pubDate>
<dc:creator>detzi</dc:creator>
<guid>http://detzi.wordpress.com/?p=108</guid>
<description><![CDATA[Heute im Olympiapark bin ich wieder einmal an sowas vorbei gekommen:
 
Segway Tours, die im Olympiap]]></description>
<content:encoded><![CDATA[<p>Heute im Olympiapark bin ich wieder einmal an sowas vorbei gekommen:</p>
<p><a href="http://detzi.wordpress.com/files/2008/07/segway1.jpg"><img class="alignnone size-thumbnail wp-image-109" src="http://detzi.wordpress.com/files/2008/07/segway1.jpg?w=76" alt="" width="76" height="96" /></a> <a href="http://detzi.wordpress.com/files/2008/07/segway2.jpg"><img class="alignnone size-thumbnail wp-image-110" src="http://detzi.wordpress.com/files/2008/07/segway2.jpg?w=107" alt="" width="107" height="96" /></a></p>
<p>Segway Tours, die im Olympiapark durchgeführt werden. Ich denk, das sollte ich auch mal machen.</p>
<p>Übrigens, die 9 Mio Webseiten sind doch noch nicht online. Der Aufbau der Datenbank hat mehr als 24 Stunden gedauert, so dass der Job zum Aufbau der Datenbank, den ich von daheim gestartet habe, terminiert wurde als die 24h-Zwangstrennung durchgeführt wurde. Also nochmal mit nohup...</p>
<p>Und jetzt sitze ich wieder an der Doktorarbeit und brauche zur Umsetzung einer Idee möglichst sofort (und eigentlich ja auch schon gestern - um den Spruch mal wieder zu bringen) einen C++ und Java-Parser. Und wenn möglich auch noch einen Parser für Ruby und Python. Eine kurze Suche mit Google (war eigentlich nur der Presse das Wort "googeln" verboten wurden?) nach einem Parser für C++ hat nur Müll hervorgebracht. Und der Code vom GCC würde ebenfalls keine mir bekannten Coding Guides/Styles erfüllen. Eigene Parser schreiben für jede der vier Programmiersprachen ist sicherlich zu aufwendig. Also irgendwas muss es doch geben. Wo sind denn nur die Studenten wenn man sie braucht, die sowas sonst in Form von Diplomarbeiten, Bachelor- oder Masterarbeiten oder irgendwelchen Studentenprojekten erledigen.</p>
<p>Nachtrag: Aha! Man muss wohl nur g'scheit googeln. Anstatt nach einen Parser für C++ zu suchen, hätte ich wohl gleich nach "parse java code" suchen sollen. Anscheinend bietet <a href="http://www.antlr.org/">Antlr</a> die Funktionalität, die ich benötige. Mit einer ewig langen Liste von Grammatiken z.B. auch für C++, Java und Pyhton. Ma gucken, ob's was taugt...</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[BBCode-Parser Done]]></title>
<link>http://furm.wordpress.com/?p=9</link>
<pubDate>Wed, 02 Jul 2008 02:26:02 +0000</pubDate>
<dc:creator>jhecht</dc:creator>
<guid>http://furm.wordpress.com/?p=9</guid>
<description><![CDATA[When first beginning Furm, I knew there was going to be somet things that I had, or wanted, to do my]]></description>
<content:encoded><![CDATA[<p>When first beginning Furm, I knew there was going to be somet things that I had, or wanted, to do myself. One of the things I wanted to do myself was the Template class, which is overall done with some tweaks here and there. Another was a bbcode system, which could be used both client and server-side, since I want Furm to be fully customizable without having to overwrite/edit files manually. For a lot of newer people, that is hard to do, especially those who have very little understanding of FTP, or who use server set ups. Well the BBCode parser for the JS side has been finished, and yes, is fully customizable. I'll get up a documentation system the moment i get a steady release out, so until then bear with me.</p>
<p>Now I just have to finish the Server-side version of the BBCode parser. I'm trying out a few ideas and hopefully we'll have a working model here soon</p>
<p>Hopefully I'll be posting again/editing this post here soon with updates.</p>
<p>Cheers,</p>
<p>-Jhecht</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Sunday Short Takes]]></title>
<link>http://socialinteraction.wordpress.com/?p=27</link>
<pubDate>Mon, 23 Jun 2008 02:45:50 +0000</pubDate>
<dc:creator>swarmsync</dc:creator>
<guid>http://socialinteraction.wordpress.com/?p=27</guid>
<description><![CDATA[1. I&#8217;m annoyed with the editors of the Globe and Mail&#8217;s Number Cruncher series.  They ]]></description>
<content:encoded><![CDATA[<p>1. I'm annoyed with the editors of the <a title="The Globe and Mail" href="http://www.theglobeandmail.com/" target="_self">Globe and Mail's</a> <a title="Number Cruncher" href="http://www.theglobeandmail.com/blogs/numbercruncher" target="_self">Number Cruncher</a> series.  They deliver their stories as blog posts but no one on the editorial team bothers to respond to their readers.  Maybe I am just the first person to ever ask a question so they are unsure what to do but come on after making me go through an annoying sign-up process you'd think someone could at least say "Can't help ya mate".  The real tragedy is it's an excellent series.</p>
<p>2. I haven't seen a lot of innovation in user interfaces for help desk applications, I wonder if creating a riff on the basic <a title="David Allen, Getting Things Done" href="http://www.davidco.com/" target="_self">Getting Thing Done</a> UIs I have seen would work and be helpful.</p>
<p>3. With all the unstructured data out there I wonder if a simple graphical tool designed to turn this data into actionable information would be helpful?  </p>
<p>What I am thinking is a visual environment with drag-and-drop parsing workflow creation where one could use predefined rules (or create their own rules) that parses a document extracting the information needed in a structured format </p>
<p>So for instance you could create a "parser" for a series of emails that always have the essential same structure and get that data into a dB.</p>
<p>Hmm, maybe I need to think more about this, it may actually be quite useful...</p>
<p>4. Another interesting blog post today on <a title="/Message Post" href="http://feeds.feedburner.com/~r/stoweboyd/wpeL/~3/317386661/connecting-the.html" target="_self">/Message</a>.  Stowe is right, at every turn and with every interaction we are exposing ourselves to loosely coupled, information rich data flows and we need to be able to mine these flows to extract <a title="Internal Link" href="http://socialinteraction.wordpress.com/2008/06/07/acquiring-actionable-information/" target="_self">actionable information</a> otherwise we are stuck with static silos. Information is out there, we just need better means of extracting it from the crushing weight of all the data we see every day.  </p>
<p>5. Another <a title="OpenID" href="http://openid.net/" target="_self">OpenID</a> <a title="Email to OpenID" href="http://emailtoid.net/" target="_self">mechanism</a>.  From the site "Emailtoid is a simple mapping service that enables the use of email addresses as OpenID identifiers."</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Extrapol source code available (not)]]></title>
<link>http://dutherenverseauborddelatable.wordpress.com/?p=96</link>
<pubDate>Tue, 17 Jun 2008 16:08:16 +0000</pubDate>
<dc:creator>yoric</dc:creator>
<guid>http://dutherenverseauborddelatable.wordpress.com/?p=96</guid>
<description><![CDATA[A quick note to inform you that the repository for Extrapol is now public. The source code as availa]]></description>
<content:encoded><![CDATA[<p style="text-align:justify;">A quick note to inform you that the <a href="http://">repository</a> for Extrapol is now public. The source code as available on the repository does not have a licence yet and will not compile as such, due to dependencies on libraries available <a href="http://forge.ocamlcore.org/projects/batteries/">somewhere else</a>. Stay tuned for an actual release.</p>
<p style="text-align:justify;"><strong>Update:</strong> Sorry, repository cut off by the administrator. I'll inform you when the sources are back.</p>
<p style="text-align:justify;">Note rapide pour vous informer que le <a href="https://www.sds-project.fr/svn/extrapol/trunk/specs/ml">code source</a> d'Extrapol est maintenant disponible au public. Il ne s'agit pas encore d'une version officielle -- en particulier, le code n'a pas encore de licence et il manque des bibliothèques (<a href="http://forge.ocamlcore.org/projects/batteries/">disponibles ailleurs</a>). Plus de détails dès qu'une version officielle est disponible.</p>
<p style="text-align:justify;"><strong>Additif:</strong> Désolé, je viens d'apprendre que le dépôt de source a été isolé par l'administrateur. Je vous tiendrai au courant dès que le code source est de nouveau public.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[crawler com beautiful soup]]></title>
<link>http://fiorix.wordpress.com/?p=51</link>
<pubDate>Thu, 12 Jun 2008 06:46:32 +0000</pubDate>
<dc:creator>alef</dc:creator>
<guid>http://fiorix.wordpress.com/?p=51</guid>
<description><![CDATA[Hoje precisei escrever uns programas pra extrair cotação de moedas do Banco Central. Aí, pra faci]]></description>
<content:encoded><![CDATA[<p>Hoje precisei escrever uns programas pra extrair <a href="http://www5.bcb.gov.br/pec/taxas/port/ptaxnpesq.asp?id=txcotacao">cotação de moedas do Banco Central</a>. Aí, pra facilitar escrevi uma classe bem abstrata que faz o trabalho básico de ir até o website e fazer o GET ou POST, e retornar o conteúdo.</p>
<p>Veja que beleza:<br />
[sourcecode language='python']# coding: utf-8<br />
# 2008-06-11 AF<br />
# crawler.py</p>
<p>import urllib, httplib</p>
<p>class crawler:<br />
    data = ''<br />
    host = None<br />
    method = 'GET'<br />
    params = {}<br />
    headers = {}<br />
    request = '/'</p>
<p>    def __call__(self, fetch_anyway=False):<br />
        if not self.host:<br />
            raise ValueError('You must provide at least the host')</p>
<p>        if not self.data:<br />
            # make the connection<br />
            fd = httplib.HTTPConnection(self.host)</p>
<p>            # set the params and send the request<br />
            params = urllib.urlencode(self.params)<br />
            fd.request(self.method, self.request, params, self.headers)</p>
<p>            # get the response<br />
            response = fd.getresponse()<br />
            self.status, self.reason = response.status, response.reason</p>
<p>            # read the data<br />
            if self.status == 200 or fetch_anyway is True:<br />
                self.data = response.read()</p>
<p>            fd.close()</p>
<p>        return self.data[/sourcecode]<br />
Basicamente, se eu criar uma classe que herda dela e especificar apenas o <em>host</em> ela, já funciona. Depois de todo o trabalho, resolvi fazer um programa relativamente simples que extrai o desempenho dos clubes do Campeonato Brasileiro 2008, usando o site da <a href="http://www.gazetaesportiva.net/campeonatos/futebol/nacional/2008/brasileirao/conteudo/desempenho.php">Gazeta Esportiva</a>.</p>
<p>Pra interpretar o conteúdo de maneira bem fácil e rápida, usei o <a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>. Apesar do meu Palmeiras não estar bem, com esse papo furado ai do Luxa, é legal ver a lista de desempenhos - especialmente por não ter o Corinthians nela, ainda mais depois da derrota de hoje contra o Sport. :)</p>
<p>Aí vai o programa (cheio de comentários):<br />
[sourcecode language='python']#!/usr/bin/env python<br />
# coding: utf-8</p>
<p>import re, sys<br />
from crawler import crawler<br />
from BeautifulSoup import BeautifulSoup</p>
<p>class fetch(crawler):<br />
    host = 'www.gazetaesportiva.net'<br />
    request = '/campeonatos/futebol/nacional/2008/brasileirao/conteudo/desempenho.php'</p>
<p>if __name__ == '__main__':<br />
    # cria o crawler<br />
    doc = fetch()</p>
<p>    # busca o documento<br />
    buffer = doc()<br />
    if not buffer:<br />
        print doc.status, doc.reason<br />
        sys.exit(1)</p>
<p>    # armazena as listas de times e desempenho nas rodadas<br />
    name_list = []<br />
    data_list = []</p>
<p>    # cria o soup, já convertendo as entidades HTML pra<br />
    # caracteres unicode: aacute vira "á"<br />
    soup = BeautifulSoup(buffer, convertEntities='html')</p>
<p>    # encontra os nomes dos times<br />
    name = soup.findAll('strong')</p>
<p>    # depois de encontrar os nomes, compilamos essa<br />
    # expressão regular que irá remover as tags<br />
    # NOTA: talvez já tenha isso no BS e eu não sei :p<br />
    junk = re.compile('</?strong>', re.IGNORECASE)</p>
<p>    # encontra os dados dos times (pelo caracter º)<br />
    data = soup.findAll(text=re.compile('\d{1,2}[\xc2\xba]'))</p>
<p>    # cria a lista de nomes removendo as tags<br />
    # e qualquer outro lixo<br />
    for n in name:<br />
        text = junk.sub('', str(n)).strip()<br />
        if text: name_list.append(text)</p>
<p>    # cria a lista de desempenho por time<br />
    if len(data) % len(name_list):<br />
        print 'Oops! Problemas com os dados.'<br />
        sys.exit(1)</p>
<p>    div = len(data) / len(name_list)<br />
    for n in range(0, len(data)+1, div):<br />
        data_list.append(data[n:n+div])</p>
<p>    # imprime tudo :)<br />
    for k, v in zip(name_list, data_list):<br />
        print k, ', '.join(v)[/sourcecode]<br />
Além de simples e razoavelmente rápido, o Beautil Soup tem muitos recursos que permitem interpretar o conteúdo de forma eficiente, evitando o trabalho demorado de escrever um mega parser pra cada HTML que vai interpretar.</p>
<p>Se rodar o programa, ele mostra isso:</p>
<pre>$ python desempenho-futebol.py
Atlético/MG 11º, 13º, 15º, 7º, 12º
Atlético/PR 6º, 5º, 5º, 9º, 5º
Botafogo 2º, 10º, 9º, 15º, 9º
Cruzeiro 2º, 2º, 1º, 1º, 2º
Coritiba 2º, 8º, 7º, 8º, 13º
Figueirense 9º, 4º, 12º, 11º, 16º
Flamengo 1º, 3º, 2º, 2º, 1º
Fluminense 11º, 17º, 19º, 20º,  20º
Goiás 13º, 14º, 16º, 17º, 19º
Grêmio 6º, 6º, 3º, 5º, 4º
Internacional 6º, 11º, 13º, 13º, 17º
Ipatinga 14º, 20º, 20º, 16º, 15º
Náutico 5º, 1º, 4º, 3º, 3º
Palmeiras 18º, 12º, 10º, 6º, 10º
Portuguesa 9º, 16º, 18º, 19º, 14º
Santos 17º, 7º, 14º, 14º, 17º
São Paulo 14º, 15º, 17º, 18º, 11º
Sport 18º, 17º, 11º, 10º, 6º
Vasco 14º, 8º, 7º, 4º, 7º
Vitória 18º, 17º, 6º, 12º, 8º</pre>
<p>Se você for copiar e colar o código, não esqueça que esse maldito CSS do wordpress fica mexendo nas aspas. Pra resolver isso, tem os <a href="http://fiorix.wordpress.com/codigo/">esquemas que publiquei aqui pra resolver</a>.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[2ª Ata de Reunião (05/06/08)]]></title>
<link>http://ximbicaprogramming.wordpress.com/?p=5</link>
<pubDate>Mon, 09 Jun 2008 14:10:57 +0000</pubDate>
<dc:creator>Tadeu Martins</dc:creator>
<guid>http://ximbicaprogramming.wordpress.com/?p=5</guid>
<description><![CDATA[Clientes presentes: Prof. Julio, Eduardo
Membros do grupo presentes: Carlos, Rodrigo, Tadeu
A reuni]]></description>
<content:encoded><![CDATA[<p>Clientes presentes: Prof. Julio, Eduardo</p>
<p>Membros do grupo presentes: Carlos, Rodrigo, Tadeu</p>
<p>A reunião se iniciou às 8:15, abordando a elicitação de requisitos. Os clientes informaram que o trabalho deve seguir o padrão estabelecido pelo software legado, mas que não deve ser realizada mera "colagem". Também foi especificado que o sistema deve controlar o usuário, mas deve desprezar a existência de mais de um.</p>
<p>Fora perguntado pelo grupo se a possibilidade de utilização de <a href="http://dinosaur.compilertools.net/">LEX e YACC</a> na implementação do parser. Os clientes disseram que essa seria uma solução possível, mas que provavelmente haveriam formas mais adeqüadas de tratar o problema da análise sintática, haja visto que trata-se de uma plataforma web.</p>
<p>Os clientes também definiram que a aquisição do vocabulário mínimo -- conjunto mínimo de palavras da língua portuguesa que conseguem definir uma vastidão de conceitos de forma não-ambígua -- é de responsabilidade do grupo. Especificaram também, que, caso não encontrado, o próprio grupo deveria gerar tal vocabulário.</p>
<p>Fora pedido também, ao encerramento da reunião, que o grupo já apresetasse parte do trabalho na terça-feira, 10 de junho.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[teh pwn config parser]]></title>
<link>http://wtflolphp.wordpress.com/?p=3</link>
<pubDate>Thu, 05 Jun 2008 22:36:38 +0000</pubDate>
<dc:creator>gizm0bill</dc:creator>
<guid>http://wtflolphp.wordpress.com/?p=3</guid>
<description><![CDATA[made this class (under beta testing atm) which loads config data from pretty special xmls.. thought ]]></description>
<content:encoded><![CDATA[<p>made this class (under beta testing atm) which loads config data from pretty special xmls.. thought zend_config_xml is drunk and stupid</p>
<p>sections can be inherited, you can define attributes and have special attributes to handle the parsing.. some features are not implemented yet..</p>
<p>teh pwnage code:</p>
<pre><big>class Gizmo_Config_Xml {
    public $data;
    private $protectedAttributes = array( "_extends"=&#62;"string","_private"=&#62;"boolean" );

    public function __construct($filename,$section = null){
        if (empty($filename)) {
            throw new Exception('Filename is not set');
        }
        $config = simplexml_load_file($filename);
        if (null === $section)
            foreach ($config as $sectionName =&#62; $sectionData){
                $this-&#62;$sectionName = $sectionData;
            }
    }

    public function __set($var,$val){
        foreach ($val-&#62;attributes() as $k=&#62;$v)
            $this-&#62;data[$var]['attributes'][$k]= (string) $v;
        $this-&#62;data[$var] = $this-&#62;_processSection($val,$var);

    }

    private function _processSection($xmlObj){
        $config = array();
        if(count($xmlObj-&#62;children())&#62;0){
            foreach($xmlObj-&#62;children() as $k=&#62;$v)
                $config['elements'][$k] = $this-&#62;_processSection($v);
        }
        foreach($xmlObj-&#62;attributes() as $k=&#62;$v)
            $config['attributes'][$k] = (string) $v;
        if(count($xmlObj-&#62;children())==0)
            $config['value'] = (string) $xmlObj;

        return $config;
    }

    private function _toArray($data,$sn){
        if(is_array($data['attributes'])){
            foreach($data['attributes'] as $k=&#62;$v)
                if(array_key_exists($k,$this-&#62;protectedAttributes)){
                    try{
                        settype($v, $this-&#62;protectedAttributes[$k]);
                    }catch (Exception $e){
                        echo $e-&#62;getMessage();
                    }
                    if($k=="_extends") {
                        $ret_data = $this-&#62;_arrayMergeRecursive($this-&#62;data[$v],$data);
                        unset($ret_data['attributes']['_extends']);
                    }
                    if($k=="_private"){
                        $flag_private = true;
                    }
                }else
                    $ret_data['attributes'][$k] = $v;
        }

        if(is_array($data['elements']))
            foreach($data['elements'] as $k=&#62;$v)
                $ret_data['elements'][$k] = $this-&#62;_toArray($v,$k);
        elseif(!isset($ret_data))
            $ret_data = $data;

        if(count($ret_data['elements']))
            unset($ret_data['value']);
        else
            $ret_data['value'] = $data['value'];

        return ($flag_private)?FALSE:$ret_data;
    }

    private function _arrayMergeRecursive($array1, $array2){
        if (is_array($array1) &#38;&#38; is_array($array2))
            foreach ($array2 as $key =&#62; $value)
                if (isset($array1[$key]))
                    $array1[$key] = $this-&#62;_arrayMergeRecursive($array1[$key], $value);
                else
                    $array1[$key] = $value;
        else
            $array1 = $array2;

        return $array1;
    }

    public function toArray(){
        $ret = array();
        foreach($this-&#62;data as $k=&#62;$v){
            $a = $this-&#62;_toArray($v,$k);
            $a!==false ? $ret[$k] = $a : null;
        }
        return $ret;
    }

}</big></pre>
<p>teh xml example:</p>
<pre>&#60;?xml version="1.0" encoding="UTF-8"?&#62;
&#60;config&#62;
    &#60;_topo _private="true" type="X1:topo"&#62;
        &#60;catId type="long" optional="true"&#62;&#60;/catId&#62;
        &#60;topoId type="long" optional="true" /&#62;
        &#60;netId type="long"&#62;0&#60;/netId&#62;
    &#60;/_topo&#62;

    &#60;_pagi _private="true" type="X2:Pagi" namespace="http://x.com/ws/schema"&#62;
        &#60;cnt type="boolean" /&#62;
        &#60;pageNum type="positiveinteger"&#62;1&#60;/pageNum&#62;
        &#60;ipp type="positiveInteger"&#62;10&#60;/ipp&#62;
    &#60;/_pagi&#62;

    &#60;_sort _private="true" type="X3:Sort" namespace="http://x.com/ws/schema"&#62;
         &#60;sortBy type="string" enum="viewCount,price,averageRating"&#62;enum[0]&#60;/sortBy&#62;
         &#60;sortOrder type="string" enum="asc,desc"&#62;enum[0]&#60;/sortOrder&#62;
    &#60;/_sort&#62;

    &#60;getSomeList&#62;
        &#60;xId type="long" optional="true" /&#62;
        &#60;topology _extends="_topo" /&#62;
        &#60;pagination _extends="_pagi" /&#62;
        &#60;sort _extends="_sort" /&#62;
        &#60;testParam _extends="_sort"&#62;
            &#60;testSubParam type="Qx:test" namespace="y.com"&#62;
                &#60;testSubSubParam type="long" /&#62;
            &#60;/testSubParam&#62;
        &#60;/testParam&#62;
    &#60;/getSomeList&#62;
&#60;/config&#62;</pre>
<p>tip: it can be used in some nifty soap actions</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[XML madness]]></title>
<link>http://twigleaf.wordpress.com/?p=8</link>
<pubDate>Tue, 27 May 2008 19:10:37 +0000</pubDate>
<dc:creator>twigleaf</dc:creator>
<guid>http://twigleaf.wordpress.com/?p=8</guid>
<description><![CDATA[Since the moment the term came into the world, people have gone crazy and beyond to use the word ]]></description>
<content:encoded><![CDATA[<p>Since the moment the term came into the world, people have gone crazy and beyond to use the word "XML" in every document and sentence they use. To me however it's just another document format that comes in the craziest forms and sizes. To be honest, there doesn't a moment go by where I look at some kind of xml file someone is using for their application and I just want to hit my head against a brick wall out of insanity. You won't believe the amount of crap implementations you'll come across.</p>
<p>But nevermind that, because it's not part of the issue at the moment. The issue is, that I have written so far in my life written 2 parser algoritms for xml files: one in Delphi/Object-Pascal, and today in C++.</p>
<p>And you seriously wouldn't ask me "Why?" if you had actually seen those horrible DOM interfaces where implementations come with a size ranging from 10-50MB.</p>
<p>The reason I wrote them is to "just" directly convert XML formatted strings or files, to the internal tree-node-leaf-whatever system I may be using at the time.</p>
<p>The point is, I write these things in about - let's see - this one in C++ is about 170 lines of code. When I start writing it, and it happened both times, I don't actually know what I'm doing... I just start writing, I know there's some if structures, state booleans and position integers - but I'm not actually 100% aware of what I'm writing. It's like typing on a keyboard like a monkey, deleting and retyping, deleting and retyping, and suddenly the code is there and it works.</p>
<p>I seriously don't think any kind of fileformat should be able to put you through that state of not knowing. A fileformat should be *almost* as easy as writing the actual output functions, which everyone know in case of XML takes you about 60 seconds to write.</p>
<p>Anyway, enough ranting. Time to upload it to subversion.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Age of Conan: feat calculators, spells, combos and wiki]]></title>
<link>http://ferv0r.wordpress.com/?p=23</link>
<pubDate>Mon, 19 May 2008 22:52:17 +0000</pubDate>
<dc:creator>ferv0r</dc:creator>
<guid>http://ferv0r.wordpress.com/?p=23</guid>
<description><![CDATA[Here a few really useful tools and applications I&#8217;ve run into so far for Age of Conan.  I]]></description>
<content:encoded><![CDATA[<p>Here a few really useful tools and applications I've run into so far for Age of Conan.  I'll try to keep it updated with the latest and greatest from the Age of Conan community.</p>
<p><strong>Feat Calculators</strong></p>
<p><a href="http://www.conanarmory.com/feat.aspx?id=10">Feat calculator at conanarmory.com</a></p>
<p style="padding-left:30px;">Has a lower browser requirement, which makes it usable for more people.</p>
<p><a href="http://feats.goonheim.com/">Feat calculator at feats.goonheim.com<br />
</a></p>
<p style="padding-left:30px;">Most commonly referenced feat calculator in the AoC community.</p>
<p><a href="http://www.tentonhammer.com/aoc/feats?class_id=10">Feat calculator at tentonhammer.com</a></p>
<p><strong>Spell and Combo references</strong></p>
<p><a href="http://www.hybes.org/?s=abilities&#38;class=TempestOfSet&#38;lang=en">Detailed list of all spells and combos at www.hybes.org</a></p>
<p><strong>Maps</strong></p>
<p><a href="http://www.conanarmory.com/search.aspx?browse=3">Zone maps at conanarmory.com</a></p>
<p style="padding-left:30px;">Very slick and useful maps, with user added information.</p>
<p><strong>WIKI</strong></p>
<p><a href="http://aoc.wikia.com/wiki/Age_of_Conan_Wiki">The Age of Conan Wiki at aoc.wikia.com</a></p>
<p><strong>Leveling Flowchart and Overview of the zones</strong></p>
<p><a href="http://www.got3n.com/aoc-leveling-chart">Flowchart at got3n.com</a></p>
<p><strong>User Interface mods</strong></p>
<p><a href="http://aoc.curse.com/downloads/details/12705/">Mirage UI</a></p>
<p style="padding-left:30px;">The most popular UI mod at the moment, it is well done and offers some customization.  I have switched to using this UI mod and I'm happy with how it works.</p>
<p><a href="http://www.werik.com/">Werik UI</a></p>
<p style="padding-left:30px;">A minimalist approach.  It removes all the decorative frills and makes all the buttons smaller.</p>
<p><a href="http://aoc.curse.com/">Curse Gaming Age of Conan downloads</a></p>
<p style="padding-left:30px;">Popular gaming site for MMO mods.</p>
<p><a href="http://aoc.curse.com/downloads/details/12783/">AoC Custom UI Patcher</a></p>
<p style="padding-left:30px;">Every time Funcom releases a patch, it breaks the custom UI's.  This external program updates the files with the latest version number, so it works properly with the newest patch.</p>
<p><strong>Combat Log Parser</strong></p>
<p><a href="http://aoc.curse.com/downloads/details/12734/">AoCombaTotals</a></p>
<p style="padding-left:30px;">An external program that parses your combat log in real time.  It also has a small window that displays your damage/sec and xp/sec.  If you are running AoC in windowed mode, it allows you to see real time data.  Very useful.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Un parser HTML in VB.NET]]></title>
<link>http://giovannicalcerano.wordpress.com/?p=13</link>
<pubDate>Mon, 19 May 2008 18:12:39 +0000</pubDate>
<dc:creator>giovannicalcerano</dc:creator>
<guid>http://giovannicalcerano.wordpress.com/?p=13</guid>
<description><![CDATA[Ho creato una classe VB.NET in grado di fare il parsing di una stringa contenente codice HTML; la st]]></description>
<content:encoded><![CDATA[<p>Ho creato una classe VB.NET in grado di fare il parsing di una stringa contenente codice HTML; la stringa può eventualmente essere caricata da un URL attraverso uno specifico metodo della classe stessa. Il codice può essere scaricato <a href="http://automatic-asp-fill-in-form.googlecode.com/files/HtmlParser.zip">da questo link</a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Batteries Included: Lazy Lists version 0.3]]></title>
<link>http://dutherenverseauborddelatable.wordpress.com/?p=79</link>
<pubDate>Sun, 18 May 2008 18:40:52 +0000</pubDate>
<dc:creator>yoric</dc:creator>
<guid>http://dutherenverseauborddelatable.wordpress.com/?p=79</guid>
<description><![CDATA[An updated version of the Lazy List module for OCaml has just been uploaded to Batteries Included an]]></description>
<content:encoded><![CDATA[<p style="text-align:justify;">An updated version of the Lazy List module for OCaml has just been uploaded to <a href="https://forge.ocamlcore.org/frs/?group_id=17">Batteries Included</a> and submitted to <a href="http://www.google.fr/url?sa=t&#38;ct=res&#38;cd=1&#38;url=http%3A%2F%2Fcode.google.com%2Fp%2Focaml-extlib%2F&#38;ei=v3cwSNH2MqPw-AL39cWIAg&#38;usg=AFQjCNEjWtWFbDXOLrRn-qxFA8-MlL8Ikg&#38;sig2=JV6e1hqE0-KmC5HpgwIeWQ">ExtLib</a>. See the release notes for more details.</p>
<p style="text-align:justify;">In addition, I am currently using this module to write a parser combinator library for OCaml. This library has reached early testing stage and will hopefully be added to Batteries Included soon.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Converting C# Code to Flowchart]]></title>
<link>http://amrox.wordpress.com/2008/05/03/converting-c-code-to-flowchart/</link>
<pubDate>Sat, 03 May 2008 17:34:02 +0000</pubDate>
<dc:creator>amrox</dc:creator>
<guid>http://amrox.wordpress.com/2008/05/03/converting-c-code-to-flowchart/</guid>
<description><![CDATA[Introduction:

Expressing the program logic in a diagram give it the advantage of making it more und]]></description>
<content:encoded><![CDATA[<p><span style="font-size:14pt;"><strong>Introduction:<br />
</strong></span></p>
<p style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">Expressing the program logic in a diagram give it the advantage of making it more understandable. This comes from the fact that the human brain is thinking in graphical way. Most of developers document their code with a flowchart. It will be better it we create a tool doing this job. We use a recursive decent parser parses a C# code and draw the corresponding flowchart.<br />
</span></p>
<p style="text-align:justify;">
<p style="text-align:justify;"><span style="font-size:14pt;"><strong>C# flow control:<br />
</strong></span></p>
<p style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">In C# language there are a lot of flow-control statements:<br />
</span></p>
<ol>
<li>
<div style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">IF-Else.<br />
</span></div>
</li>
<li>
<div style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">While.<br />
</span></div>
</li>
<li>
<div style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">Do-while.<br />
</span></div>
</li>
<li>
<div style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">For.<br />
</span></div>
</li>
<li>
<div style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">Foreach.<br />
</span></div>
</li>
<li>
<div style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">Switch.<br />
</span></div>
</li>
</ol>
<p style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">Each of these statements affects the flow of events in C# code. Some of these statements are considered decision statement (If-Else and switch). The others are considered loop statements (while, do-while, for and foreach).<br />
</span></p>
<p style="text-align:justify;"><span style="font-family:Franklin Gothic Book;">Decision statements put a condition to go through a block of code. If the condition occurred the block of code will be executed. Otherwise, the block of code will be skipped. If-else statement exactly does this job CFG:<br />
</span></p>
<p style="text-align:justify;background:#f3f3f3;"><span style="font-family:Franklin Gothic Book;">If-statement := <strong>if</strong><br />
<strong>(</strong>condition<strong>)</strong><br />
<strong>{</strong> statements1 <strong>} </strong>[<strong>else {</strong> statements2 <strong>}</strong>]<br />
</span></p>
<p><span style="font-family:Franklin Gothic Book;">The if part is necessary put the else part is optional. So the flowchart of the if-else statement takes two forms:<br />
</span></p>
<p><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc1.png" alt="" /><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc2.png" alt="" /><span style="font-family:Franklin Gothic Book;"><br />
</span></p>
<p><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc3.png" alt="" /><span style="font-family:Franklin Gothic Book;"><br />
</span></p>
<p><span style="font-family:Franklin Gothic Book;">The switch statement guides the program to a path from multiple paths. The CFG of switch statement is:<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Franklin Gothic Book;">Switch_statement  := <strong>switch</strong><br />
<strong>(</strong>expression<strong>) {</strong> {<strong> case</strong> value<strong>:</strong> statements <strong>break; </strong>} <strong> }<br />
</strong></span></p>
<p><span style="font-family:Franklin Gothic Book;">As the switch statement has multi-paths the flowchart will be in the form:<br />
</span></p>
<p><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc4.png" alt="" /><span style="font-family:Franklin Gothic Book;"><br />
</span></p>
<p><span style="font-family:Microsoft Sans Serif;">Loop statements repeat a block of statements according to a condition. Only the expression and the expression place changes from a statement to another. The do-while loop which have the following CFG:<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Microsoft Sans Serif;">Do-while := <strong>do</strong><br />
<strong>{</strong> statements <strong>} while</strong><br />
<strong>(</strong>condition <strong>);</strong><br />
</span></p>
<p><span style="font-family:Microsoft Sans Serif;">Takes the form:<br />
</span></p>
<p><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc5.png" alt="" /></p>
<p><span style="font-family:Microsoft Sans Serif;">The other statements have the same form in flowchart. There CFG's are:<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Microsoft Sans Serif;">While-statement:= <strong>while</strong><br />
<strong>(</strong>condition <strong>)</strong><br />
<strong>{</strong>statements<strong>}</strong><br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Microsoft Sans Serif;">For-statement:= <strong>for</strong><br />
<strong>(</strong>for-Exp<strong>)</strong><br />
<strong>{</strong> statements <strong>}</strong><br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Microsoft Sans Serif;">Foreach := <strong>foreach(</strong>foreach-exp<strong>)</strong><br />
<strong>{</strong>statements <strong>}</strong><br />
</span></p>
<p><span style="font-family:Microsoft Sans Serif;">The flowchart form of these statements is:<br />
</span></p>
<p><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc6.png" alt="" /></p>
<p><span style="font-size:14pt;"><strong>Process<br />
</strong></span></p>
<p style="text-align:justify;"><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc7.png" alt="" /><span style="font-family:Microsoft Sans Serif;">The process starts when a new code entered. This code enters a scanner that output tokens. The tokens enter a recursive decent parser gives a parse tree represents the entered code. The parse tree then enters the drawing module to be displayed as a model.<br />
</span></p>
<p><span style="font-size:14pt;"><strong>Scanner<br />
</strong></span></p>
<p><span style="font-family:Microsoft Sans Serif;">The scanner takes the code and tries to split the code into a set of tokens each token belongs to one of these types:<br />
</span></p>
<ol>
<li><span style="font-family:Microsoft Sans Serif;">Flow-control keyword: keywords represent one of the six flow-control statements discussed in C# flow control section.<br />
</span></li>
<li><span style="font-family:Microsoft Sans Serif;">Block start: like "{" symbol.<br />
</span></li>
<li><span style="font-family:Microsoft Sans Serif;">Block end: like "}" symbol.<br />
</span></li>
<li><span style="font-family:Microsoft Sans Serif;">While spaces.<br />
</span></li>
<li><span style="font-family:Microsoft Sans Serif;">Separators: set of symbols separate two tokens.<br />
</span></li>
</ol>
<p><span style="font-family:Microsoft Sans Serif;">According to these types tokens will take there places in parse tree discussed in next section. </span><span style="font-size:14pt;"><strong><br />
</strong></span></p>
<p><span style="font-size:14pt;"><strong>Parse Tree<br />
</strong></span></p>
<p style="text-align:justify;"><span style="font-family:Microsoft Sans Serif;">The parse tree structure used in this tool must have two criteria. First it can represent the sequential statements. Second, it must represent the nested statements. So we use a tree structure represented by these classes:<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"><strong>Stake current=new TreeNode("main block");<br />
</strong></span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"><strong>Class TreeNode<br />
</strong></span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;">{<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"> Block current<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"> Public Relation[] relation;<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;">}<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"><strong>Class Relation<br />
</strong></span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;">{<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"> String Name;<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"> treeNode from;<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;"> treeNode to;<br />
</span></p>
<p style="background:#f3f3f3;"><span style="font-family:Courier New;">}</span><span style="font-family:Microsoft Sans Serif;"><br />
</span></p>
<p style="text-align:justify;"><span style="font-family:Microsoft Sans Serif;">TreeNode is a class represents a node in the parse tree. Each node have object from Block class represents a specified block in flowchart may be decision, loop start, loop end, process start state or end state.</span><span style="font-family:Courier New;"><br />
</span><span style="font-family:Microsoft Sans Serif;">In TreeNode</span><span style="font-family:Courier New;"><br />
</span><span style="font-family:Microsoft Sans Serif;">there is array of relations represents the relations from this TreeNode to its child nodes.</span><span style="font-family:Courier New;"><br />
</span></p>
<p style="text-align:justify;">
<p style="text-align:justify;"><span style="font-size:14pt;"><strong>Parser<br />
</strong></span></p>
<p style="text-align:justify;"><span style="font-family:Microsoft Sans Serif;">The function of parser is to take the tokens and place them in their places in a parse tree. The parser specifies the token place in parse tree according to token types. If the token is new flow control token it will be added as a new node in the node existing in the top of stack and this new node will be pushed in the current stack. If token is not a flow control token, the token will be added in the "current" block in the current node. Finally if the token is block termination token (like: } ) pop the first node in current stack.<br />
</span></p>
<p style="text-align:justify;">
<p style="text-align:justify;"><span style="font-size:14pt;"><strong>Drawer<br />
</strong></span></p>
<p style="text-align:justify;"><span style="font-family:Microsoft Sans Serif;">After the parser produced the parse tree, the Drawer takes the parse tree and draws a flowchart representing it. The Algorithm used in the Drawer as follow:<br />
</span></p>
<p style="text-align:center;"><img src="http://amrox.files.wordpress.com/2008/05/050308-1733-convertingc8.png" alt="" /></p>
<p style="text-align:justify;">An algorithm is needed to determine the place where the drawer should place the shapes it draws. The y-position is the same for nodes have the same parent. While the x-position should be incremented for a child than the previous one.</p>
<p style="text-align:justify;">
<p style="text-align:justify;"><span style="font-size:14pt;"><strong>Conclusion<br />
</strong></span></p>
<p>To Convert a C# code to flowchart you need to parse the code searching for the flow control statements and code blocks then its very simple to draw the flowchart if you have a well structured parse tree.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[The Definitive ANTLR Reference (한글)]]></title>
<link>http://mhjung.wordpress.com/?p=16</link>
<pubDate>Thu, 01 May 2008 12:15:01 +0000</pubDate>
<dc:creator>Hee</dc:creator>
<guid>http://mhjung.wordpress.com/?p=16</guid>
<description><![CDATA[특정 도메인 언어 구축을 위한 ANTLR(ANother Tool for Language Recognition) Reference 문]]></description>
<content:encoded><![CDATA[<p>특정 도메인 언어 구축을 위한 ANTLR(ANother Tool for Language Recognition) Reference 문서를 번역중에 있습니다.</p>
<ul>
<li> <span style="color:#0000ff;"><strong>ANTLR은 무엇입니까?</strong></span></li>
</ul>
<p style="padding-left:30px;">ANTLR은 다양한 타겟 언어에서 액션을 포함하는 문법적 설명으로 부터 분석자(Recognizers), 통역자(Interpreters), 컴파일러(Compilers)와 번역기를 제공하는 언어 도구이다. ANTLR은 트리 구성, 트리 순회, 번역, 오류 회복, 오류 알림을 위한 뛰어난 지원을 제공한다. 현재 한달에 약 5,000건의 ANTLR 소스가 다운로드 되고 있다.</p>
<p style="padding-left:30px;">
<p>공식 사이트 : <a title="ANTLR" href="http://www.antlr.org/" target="_blank">http://www.antlr.org/</a></p>
<p>번역 문서 : 저작권이 있는 문서이므로 공개를 중지합니다. 번역 후 가이드를 만들어 올리겠습니다.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Tagsoup]]></title>
<link>http://baskarfelix.wordpress.com/?p=15</link>
<pubDate>Tue, 15 Apr 2008 18:07:20 +0000</pubDate>
<dc:creator>baskarfelix</dc:creator>
<guid>http://baskarfelix.wordpress.com/?p=15</guid>
<description><![CDATA[TagSoup is a library for extracting information out of unstructured HTML code, sometimes known as ta]]></description>
<content:encoded><![CDATA[<p>TagSoup is a library for extracting information out of unstructured HTML code, sometimes known as tag-soup. The HTML does not have to be well formed, or render properly within any particular framework. This library is for situations where the author of the HTML is not cooperating with the person trying to extract the information, but is also not trying to hide the information.</p>
<p>Please find the below URL for more details :<a href="http://mercury.ccil.org/~cowan/XML/tagsoup/">http://mercury.ccil.org/~cowan/XML/tagsoup/</a></p>
<p> </p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Flexible MXML Editor]]></title>
<link>http://nsdevaraj.wordpress.com/?p=28</link>
<pubDate>Wed, 09 Apr 2008 11:38:11 +0000</pubDate>
<dc:creator>nsdevaraj</dc:creator>
<guid>http://nsdevaraj.wordpress.com/?p=28</guid>
<description><![CDATA[click here for AIR version of Flexible, for editing and modifying the MXML. Alternative for Flex Bui]]></description>
<content:encoded><![CDATA[<p><a href="http://www.esnips.com/web/Flexible/">click</a> here for AIR version of Flexible, for editing and modifying the MXML. Alternative for Flex Builder. This works as MXML Parser. I have plans to create SWF instead of mxml (on runtime).</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Haskell-style Parser Combinators in Scheme]]></title>
<link>http://shaurz.wordpress.com/?p=16</link>
<pubDate>Tue, 11 Mar 2008 02:53:58 +0000</pubDate>
<dc:creator>shaurz</dc:creator>
<guid>http://shaurz.wordpress.com/?p=16</guid>
<description><![CDATA[One evening I though it would be cool to implement parser combinators in Scheme. The results are exp]]></description>
<content:encoded><![CDATA[<p>One evening I though it would be cool to implement parser combinators in Scheme. The results are explained in this blog post.</p>
<p><b>Download:</b> <a href="http://www.iogopro.co.uk/blog/files/parser-combinators.scm">parser-combinators.scm</a></p>
<p>The code will run in <a href="http://www.call-with-current-continuation.org/">Chicken Scheme</a>.</p>
<p>We can define a parser as a procedure which takes an input string and an index (i.e. the index of the current character on the input). If the parse succeeds, it returns a value and an index ≥ the original index to the unconsumed input. On failure it returns #f for the both the value and index (the index is checked for failure and the value is ignored).</p>
<p>The two simplest parsers are <code>fail</code>, which always fails, and <code>(return v)</code>, which always succeeds and returns v without consuming any input.</p>
<pre>
(define fail (lambda (s i) (values #f #f)))
(define (return v) (lambda (s i) (values v i)))</pre>
<p>The parser <code>any-char</code> removes one character of the input or fails if the end of the input is reached.</p>
<pre>
(define any-char
  (lambda (s i)
    (if (&#60; i (string-length s))
        (values (string-ref s i) (+ i 1))
        (values #f #f))))</pre>
<p>On their own, parser procedures are cumbersome. Haskell provides a neat solution: <code>do</code> notation. Now for some Macro Magic. Let's write a macro called <code>parser</code> which allows us to write Haskell-style code.</p>
<pre>
(define-for-syntax *v-name* (gensym 'v))
(define-for-syntax *s-name* (gensym 's))
(define-for-syntax *i-name* (gensym 'i))

(define-for-syntax (expand-parser-body forms)
  (match forms
    [(_ '&#60;- _)
       (error "parser must end with non-binding form")]
    [(p)
       `(,p ,*s-name* ,*i-name*)]
    [(v '&#60;- p . xs)
       `(let-values ([(,v ,*i-name*) (,p ,*s-name* ,*i-name*)])
          (if ,*i-name*
              ,(expand-parser-body xs)
              (values #f #f)))]
    [(p . xs)
       `(let-values ([(,*v-name* ,*i-name*) (,p ,*s-name* ,*i-name*)])
          (if ,*i-name*
              ,(expand-parser-body xs)
              (values #f #f)))]))

(define-macro (parser . body)
  `(lambda (,*s-name* ,*i-name*)
     ,(expand-parser-body body)))</pre>
<p>Now we can write parsers which look and work just like Haskell code!</p>
<pre>
(define two-chars-swap
  (parser
    a &#60;- any-char
    b &#60;- any-char
    (return (string b a))))</pre>
<p>This example shows how we can use the any-char parser to write a parser to read two characters returning the string of the characters in reverse order. The results of any-char are bound to local variables which can be used immediately after definition.</p>
<p>How does this work? Let's look at what two-chars-swap expands into.</p>
<pre>
1:  (lambda (s3 i4)
2:    (let-values ([(a i4) (any-char s3 i4)])
3:      (if i4
4:          (let-values ([(b i4) (any-char s3 i4)])
5:            (if i4
6:                ((return (string b a)) s3 i4)
7:                (values #f #f)))
8:          (values #f #f))))</pre>
<p>In line 1 we see the parser is actually a function, which takes a string (s3) and an index (i4), as expected (these are gensym'd variable names). Line 2 binds the result of calling any-char with the string and index to the a variable defined by the user. Notice how the call is implicit in the unexpanded form. The new index is re-bound to i4, shadowing the original index. Line 3 tests the index to see if the parse failed. Failure here causes failure for the whole parser (line 8). Otherwise we continue and call any-char again with the new index, binding the result to b and shadowing i4 just like in line 2. Line 5 checks for failure (line 7). Finally we call the (return (string b a)) parser, whose result is also the result of the whole parser.</p>
<p>No parser demo is complete without an implementation of an infix calculator language. One disadvantage of parser combinators is that left-recursion is not allowed (it will cause an infinite loop). This can be overcome by using a loop to slurp up sequences of operators of the same precedence, e.g. <code>1 + 2 + 3</code>. See <a href="http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm">this page</a> for a good explanation. Curiously, most examples I found on the web were wrong (they would parse <code>"1 + 2"</code> ignoring <code>"+ 3"</code>). I guess nobody actually tests their grammars.</p>
<p>Before we can write interesting parsers we need a few low-level utilities. The <code>choice</code> procedure takes a list of parsers and returns a parser which tries each parser in sequence until one of them succeeds. This gives us basic backtracking.</p>
<pre>
(define (choice . ps)
  (lambda (s i)
    (let loop ([p ps])
      (if (pair? p)
          (let-values ([(v i) ((car p) s i)])
            (if i
                (values v i)
                (loop (cdr p))))
          (values #f #f)))))</pre>
<p>The <code>matches</code> procedure returns a parser which matches a particular string or fails. This is useful for matching symbols or keywords.</p>
<pre>
(define (matches m)
  (lambda (s i)
    (let ([n (string-length m)])
      (if (and (&#60;= (+ i n) (string-length s))
               (string=? m (substring s i (+ i n))))
          (values (substring s i (+ i n)) (+ i n))
          (values #f #f)))))</pre>
<p>The <code>while-char</code> procedure returns a parser which accepts characters while the character predicate holds. <code>while1-char</code> works similarly but requires at least one character.</p>
<pre>
(define (while-char pred)
  (lambda (s i)
    (let ([len (string-length s)])
      (let loop ([j i])
        (if (and (&#60; j len) (pred (string-ref s j)))
            (loop (+ j 1))
            (values (substring s i j) j))))))

(define (while1-char pred)
  (parser
    s &#60;- (while-char pred)
    (if (&#62; (string-length s) 0) (return s) fail)))</pre>
<p>In the calculator language we need to match decimal integers. Thanks to <code>while1-char</code> this is very easy!</p>
<pre>
(define decimal
  (parser
    s &#60;- (while1-char digit?)
    (return (string-&#62;number s))))</pre>
<p>To accept spaces around numbers and operators we use <code>token</code> which slurps up spaces before calling the given parser.</p>
<pre>
(define (token p)
  (parser
    (while-char space?)
    x &#60;- p
    (return x)))</pre>
<p>Finally we have enough to define the classical term/factor/expr parser. This version returns the s-expression instead of calculating the result.</p>
<pre>
(define expr
  (parser
    lhs &#60;- term
    (let loop ([lhs lhs])
      (choice
        (parser
          opr &#60;- (token (choice (matches "+") (matches "-")))
          rhs &#60;- term
          (loop (list (string-&#62;symbol opr) lhs rhs)))
        (return lhs)))))

(define term
  (parser
    lhs &#60;- factor
    (let loop ([lhs lhs])
      (choice
        (parser
          opr &#60;- (token (choice (matches "*") (matches "/")))
          rhs &#60;- factor
          (loop (list (string-&#62;symbol opr) lhs rhs)))
        (return lhs)))))

(define factor
  (choice
    (token decimal)
    (parser
      (token (matches "("))
      e &#60;- expr
      (token (matches ")"))
      (return e))
    (parser
      (token (matches "-"))
      e &#60;- factor
      (return (list '- e)))))</pre>
<p>Now we can use <code>test-parser</code> (see the code file for details) to test the expression parser.</p>
<pre>
#;1&#62; (test-parser expr)
&#62;&#62; 1 + 5/3 * (8 + (9 - -4)) / (7*7 + 6) + 2
Parsed    : "1 + 5/3 * (8 + (9 - -4)) / (7*7 + 6) + 2" (40 characters)
Returned  : (+ (+ 1 (/ (* (/ 5 3) (+ 8 (- 9 (- 4)))) (+ (* 7 7) 6))) 2)</pre>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Parseando XML en Java mediante SAX]]></title>
<link>http://lefunes.wordpress.com/?p=71</link>
<pubDate>Sat, 01 Mar 2008 12:53:00 +0000</pubDate>
<dc:creator>lefunes</dc:creator>
<guid>http://lefunes.wordpress.com/?p=71</guid>
<description><![CDATA[SAX es un API totalmente escrita en Java e incluida dentro del JRE que nos permite crear nuestro pro]]></description>
<content:encoded><![CDATA[<p>SAX es un API totalmente escrita en Java e incluida dentro del JRE que nos permite crear nuestro propio parser de XML.</p>
<p>Veremos como construir un parser genérico de XML y como lo podemos adaptar a nuestras necesidades.</p>
<p><!--more-->Entre las clases más importantes a tener en cuenta tenemos:</p>
<p><b>Interface org.xml.sax.XMLReader:</b><br />
Está interfaz que debe implementar un lector de XML.<br />
Cada vez que el <code>XMLReader</code> encuentra el principio del archivo, el final, un elemento, un caracter especial, un espacio, etc. notifica esto al <code>ContentHandler</code> asociado.<br />
Cada vez que encuentra un error se lo notifica al <code>ErrorHandler</code> asociado.</p>
<p><b>Interface org.xml.sax.ContentHandler:</b><br />
La clase que la implementa obtiene la capacidad de recibir todas las notificaciones de contenido de un XMLReader.<br />
La forma de asociar este a un XMLReader es mediante el método <code>setContentHandler</code>.</p>
<p><b>Interface org.xml.sax.ErrorHandler:</b><br />
La clase que la implementa obtiene la capacidad de recibir todas las notificaciones de error producidas por un XMLReader<br />
La forma de asociar este a un XMLReader es mediante el método <code>setErrorHandler</code>.<br />
En caso de que no exista ningun ErrorHandler asociado al XMLReader, este ultimo no reportara ningun error, salvo las excepciones <code>SAXParseException</code>.</p>
<p><b>Clase org.xml.sax.helpers.DefaultHandler</b>:<br />
Clase que implementa tanto a ContentHandler como a ErrorHandler (además de  DTDHandler y EntityResolver que no veremos por ahora), proveiendo implementaciones por defectos para todos sus métodos.</p>
<p>Esta clase es de la que extenderemos para poder crear nuestro propio parser de XML.</p>
<p><b>Clase org.xml.sax.helpers.XMLReaderFactory</b>:<br />
Clase que provee métodos estáticos para la creación de XMLReaders.<br />
Utilizaremos el método estático <code>XMLReaderFactory.createXMLReader()</code> que nos devuelve el XMLReader un por defecto para nuestro sistema.</p>
<p><b>Creando el Lector</b><br />
Para ello haremos de la siguiente forma:</p>
<p>[sourcecode language="java"]<br />
class LectorXML extends DefaultHandler {</p>
<p>    private final XMLReader xr;</p>
<p>    public LectorXML() throws SAXException {<br />
        xr = XMLReaderFactory.createXMLReader();<br />
        xr.setContentHandler(this);<br />
        xr.setErrorHandler(this);<br />
    }<br />
}[/sourcecode]<br />
Hasta aquí hemos creado un clase <code>LectorXML</code> que hereda de <code>DefaultHandler</code>, donde en el constructor creamos el <code>XMLReader</code> mediante <code>XMLReaderFactory</code>.<br />
Al <code>XMLReader</code> le asociamos el <code>ContentHandler</code> y <code>ErrorHandler</code>, que es el propio LectorXML.Agregamos un método para leer archivos XML:<br />
[sourcecode language="java"]<br />
class LectorXML extends DefaultHandler {</p>
<p>    private final XMLReader xr;</p>
<p>    public LectorXML() throws SAXException {<br />
        xr = XMLReaderFactory.createXMLReader();<br />
        xr.setContentHandler(this);<br />
        xr.setErrorHandler(this);<br />
    }</p>
<p>    public void leer(final String archivoXML)<br />
             throws FileNotFoundException, IOException,<br />
                       SAXException {<br />
        FileReader fr = new FileReader(archivoXML);<br />
        xr.parse(new InputSource(fr));<br />
    }<br />
}[/sourcecode]<br />
En este punto estamos preparados para probar nuestro Lector:<br />
[sourcecode language="java"]<br />
public class PruebaSAX {<br />
    public static void main(String[] args)<br />
              throws FileNotFoundException, IOException,<br />
                        SAXException {<br />
        LectorXML lector = new LectorXML();<br />
        lector.leer("test.xml");<br />
    }<br />
}[/sourcecode]<br />
Si ejecutamos veremos que se produce nada (salvo que la ruta al XML sea incorrecta). Esto es porque los métodos de <code>DefaultHandler</code> tienen implementaciones vacías de los métodos.Sobrescribimos entonces algunos de los métodos para ver la salida:<br />
[sourcecode language="java"]<br />
class LectorXML extends DefaultHandler {</p>
<p>    private final XMLReader xr;</p>
<p>    public LectorXML() throws SAXException {<br />
        xr = XMLReaderFactory.createXMLReader();<br />
        xr.setContentHandler(this);<br />
        xr.setErrorHandler(this);<br />
    }</p>
<p>    public void leer(final String archivoXML)<br />
             throws FileNotFoundException, IOException,<br />
                       SAXException {<br />
        FileReader fr = new FileReader(archivoXML);<br />
        xr.parse(new InputSource(fr));<br />
    }</p>
<p>    @Override<br />
    public void startDocument() {<br />
        System.out.println("Comienzo del Documento XML");<br />
    }</p>
<p>    @Override<br />
    public void endDocument() {<br />
        System.out.println("Final del Documento XML");<br />
    }</p>
<p>    @Override<br />
    public void startElement(String uri, String name,<br />
              String qName, Attributes atts) {<br />
        System.out.println("tElemento: " + name);</p>
<p>        for (int i = 0; i < atts.getLength(); i++) {<br />
         System.out.println("ttAtributo: " +<br />
          atts.getLocalName(i) + " = "+ atts.getValue(i));<br />
        }<br />
    }</p>
<p>    @Override<br />
    public void endElement(String uri, String name,<br />
                                 String qName) {<br />
        System.out.println("tFin Elemento: " + name);<br />
    }<br />
}[/sourcecode]<br />
Si ejecutamos ahora veremos como nos muestra el comienzo y fin del documento, así como todos los elementos y sus atributos de nuestro documento.</p>
<p>De la misma forma podemos sobrescribir los métodos para ver los caracteres especiales que aparecen en el XML (espacios, saltos de lineas, etc), mostrar errores y excepciones generadas a la hora de realizar la lectura y parseo, etc.</p>
<p>Espero que les sirva.<br />
Hasta la Próxima.</p>
<p><b>Más Info</p>
<ul>
<li><b></b><b></b><b></b><b></b><b></b><b></b><b><a href="http://www.saxproject.org/" target="_blank">Página oficial de SAX</a></b></li>
<li><b></b><b></b><b></b><b></b><b></b><b></b><b><a href="http://java.sun.com/javase/6/docs/api/org/xml/sax/XMLReader.html">Javadoc XMLReader</a></b></li>
<li><b></b><b></b><b></b><b></b><b></b><b></b><b><a href="http://java.sun.com/javase/6/docs/api/org/xml/sax/helpers/DefaultHandler.html" target="_blank">Javadoc DefaultHandler</a></b></li>
<li><b></b><b></b><b></b><b></b><b></b><b></b><b><a href="http://java.sun.com/javase/6/docs/api/org/xml/sax/ContentHandler.html" target="_blank">Javadoc ContentHandler</a></b></li>
<li><b></b><b></b><b></b><b></b><b></b><b></b><b><a href="http://java.sun.com/javase/6/docs/api/org/xml/sax/ErrorHandler.html" target="_blank">Javadoc ErrorHandler</a></b></li>
<li><b></b><b></b><b></b><b></b><b></b><b></b><b><a href="http://java.sun.com/javase/6/docs/api/org/xml/sax/helpers/XMLReaderFactory.html" target="_blank">Javadoc XMLReaderFactory</a></b></li>
</ul>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Sharing TA ah...]]></title>
<link>http://miakamayani.wordpress.com/?p=13</link>
<pubDate>Wed, 20 Feb 2008 08:26:22 +0000</pubDate>
<dc:creator>miakamayani</dc:creator>
<guid>http://miakamayani.wordpress.com/?p=13</guid>
<description><![CDATA[Sore ini rencananya mau mulai ngerjain TA, tapi dah bosen, jadilah menulis blog hahaha :D. Disini sa]]></description>
<content:encoded><![CDATA[<p>Sore ini rencananya mau mulai ngerjain TA, tapi dah bosen, jadilah menulis blog hahaha :D. Disini saya mau berbagi soal TA saya "Text Summarization". Kebetulan untuk text summarization ada banyak tool yang bebas di-download, modul-modul pada umumnya yang dibutuhkan itu untuk aplikasi natural language processing antara lain:</p>
<p>1. POS tagger, contohnya: Brill Tagger. Part-of-speech (POS) tagger bertujuan untuk memberi jabatan (POS) aturan grammar pada string/kata. Kalau dalam bahasa Inggris mis.:Noun Phrase, Verb Phrase, Preposition, Adj. dll.</p>
<p>2. Parser, contohnya: Collin parser, Charniak parser dll. Parser bertujuan untuk membangun parse tree yaitu bentuk tree dari suatu string/kalimat. Kalau pernah tahu CFG, akarnya itu simbol terminal  (LHS) sedangkan anak-anaknya simbol non-terminal (RHS). Node-node yang membentuk pohon  berupa POS, sedangkan daunnya itu kata-kata.</p>
<p>3. Natural language generator, contohnya: Nitrogen, Carmel, Tiburon dll (www.isi.edu. Tool ini berguna untuk mereduksi tree sehingga memudahkan untuk komputasi, disediakan pula penghitung skornya untuk PCFG, word bigram dll yang biasanya didapat dari hasil training.</p>
<p>Saya sangat bersyukur karena ternyata di luar sana banyak orang-orang pintar yang telah membuatkan tool untuk keperluan TA saya hahaha :D. Tadinya dah mikir kayaknya ga mungkin bisa dikerjain kalau semua modulnya bikin dari awal. Saya sangat berharap kalau topik NLP menjadi interest di kalangan mahasiswa Informatika, soalnya sampai saat ini topik TA yang bertajuk NLP masih merupakan hal yang jarang. Padahal NLP itu menarik banget (promosi :p), selama ini orang-orang pintar di luar udah banyak yang bikin aplikasi NLP sesuai dengan bahasa native mereka (Spanyol, Prancis, Cina, Jepang, Inggris) tapi dari Indonesia belum ada sama sekali. Walaupun TA saya sendiri masih pake bhs. Inggris sih :p. Tapi moga-moga ada yang tertarik untuk mengembangkan TA saya untuk bahasa Indonesia. Menurut saya prospek NLP sangat menjanjikan, banyak kebutuhan manusia di bidang teknologi informasi yang bisa diakomodasi oleh aplikasi NLP. Misalnya summarization untuk suatu text book, atau artikel berita online. Orang kan males tuh kalau harus baca dulu semuanya, dengan summarization diharapkan informasi yang diberikan representatif dan mewakili dokumen aslinya. Jadi, tertarik dengan NLP?:p</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[14 février 2007 - modification de la classe abstraite item_list]]></title>
<link>http://xtof78000.wordpress.com/2008/02/14/14-fevrier-2007-modification-de-la-classe-abstraite-item_list/</link>
<pubDate>Thu, 14 Feb 2008 12:44:00 +0000</pubDate>
<dc:creator>xtof78000</dc:creator>
<guid>http://xtof78000.wordpress.com/2008/02/14/14-fevrier-2007-modification-de-la-classe-abstraite-item_list/</guid>
<description><![CDATA[Pour pouvoir utiliser item_list comme superclasse de microarray_list, il est nécessaire de pouvoir ]]></description>
<content:encoded><![CDATA[<p>Pour pouvoir utiliser item_list comme superclasse de microarray_list, il est nécessaire de pouvoir parser le fichier gpr dans item_list, or cette classe parse les fichier de façon très simple, fichier tabulé, 1 ligne d'entête et puis c tout.<br />l'idée est donc de faire un objet parser contenant toute les infos du fichier a parser. la classe parser aura des attribute contenant le format du fichier, le séparateur de champ, l'indicateur de texte (en option), l'indicateur de commentaire et/ou de meta-données.<br /><span style="font-weight:bold;color:rgb(255, 0, 0);">14h13</span> : comment procéder ? je vais commencer par créer la classe et je verrais après<br /><span style="color:rgb(255, 0, 0);font-weight:bold;">16h00</span> : la classe Parser a été créée avec un constructeur, et des méthodes qui vont bien<br /><code><br /><span style="color:rgb(0, 153, 0);">class Parser :</span><br /><span style="color:rgb(0, 153, 0);">    '''</span><br /><span style="color:rgb(0, 153, 0);">    class parser. define info for parsing a file</span><br /><span style="color:rgb(0, 153, 0);">    '''</span><br /><span style="color:rgb(0, 153, 0);">    attr_list = {</span><br /><span style="color:rgb(0, 153, 0);">               'field_separator' : '\t',</span><br /><span style="color:rgb(0, 153, 0);">               'text_separator' : '',</span><br /><span style="color:rgb(0, 153, 0);">               'comment_mark' : '#',</span><br /><span style="color:rgb(0, 153, 0);">               'metafield_marker' : '=',</span><br /><span style="color:rgb(0, 153, 0);">               'metafield_comment' : False,</span><br /><span style="color:rgb(0, 153, 0);">               'header_line' : True,</span><br /><span style="color:rgb(0, 153, 0);">               'format' : None</span><br /><span style="color:rgb(0, 153, 0);">               }</span><br /><span style="color:rgb(0, 153, 0);">    cvs = {</span><br /><span style="color:rgb(0, 153, 0);">               'field_separator' : '\t',</span><br /><span style="color:rgb(0, 153, 0);">               'text_separator' : '',</span><br /><span style="color:rgb(0, 153, 0);">               'comment_mark' : '',</span><br /><span style="color:rgb(0, 153, 0);">               'metafield_marker' : '',</span><br /><span style="color:rgb(0, 153, 0);">               'metafield_comment' : False,</span><br /><span style="color:rgb(0, 153, 0);">               'header_line' : True,</span><br /><span style="color:rgb(0, 153, 0);">               'format' : 'cvs'</span><br /><span style="color:rgb(0, 153, 0);">               }</span><br /><span style="color:rgb(0, 153, 0);">    gpr = {</span><br /><span style="color:rgb(0, 153, 0);">               'field_separator' : '\t',</span><br /><span style="color:rgb(0, 153, 0);">               'text_separator' : '',</span><br /><span style="color:rgb(0, 153, 0);">               'comment_mark' : '#',</span><br /><span style="color:rgb(0, 153, 0);">               'metafield_marker' : '=',</span><br /><span style="color:rgb(0, 153, 0);">               'metafield_comment' : False,</span><br /><span style="color:rgb(0, 153, 0);">               'header_line' : True,</span><br /><span style="color:rgb(0, 153, 0);">               'format' : 'gpr'</span><br /><span style="color:rgb(0, 153, 0);">               }</span><br /><span style="color:rgb(0, 153, 0);">    def __init__(self, **arg):</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        x = Parser(**arg)</span><br /><span style="color:rgb(0, 153, 0);">        authorized arguments: </span><br /><span style="color:rgb(0, 153, 0);">        -field_separator (default TAB)</span><br /><span style="color:rgb(0, 153, 0);">        -text_separator (default None)</span><br /><span style="color:rgb(0, 153, 0);">        -comment_mark (default #)</span><br /><span style="color:rgb(0, 153, 0);">        -metafield_marker (default =)</span><br /><span style="color:rgb(0, 153, 0);">        -metafield_comment (default False)</span><br /><span style="color:rgb(0, 153, 0);">        -header_line (default True) </span><br /><span style="color:rgb(0, 153, 0);">        -format (default None)</span></p>
<p><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        self.set_default()</span><br /><span style="color:rgb(0, 153, 0);">        for k,v in arg.items():</span><br /><span style="color:rgb(0, 153, 0);">            try :</span><br /><span style="color:rgb(0, 153, 0);">                self.set_attr(k,v)</span><br /><span style="color:rgb(0, 153, 0);">            except ParserError, e :</span><br /><span style="color:rgb(0, 153, 0);">                print e</span><br /><span style="color:rgb(0, 153, 0);">        if self.format :</span><br /><span style="color:rgb(0, 153, 0);">            self.set_format(self.format)</span></p>
<p><span style="color:rgb(0, 153, 0);">    def set_default(self):</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        x.set_default() set attribute value to their default value</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        self.set_format("attr_list")</span></p>
<p><span style="color:rgb(0, 153, 0);">    def get_format(self, format_name):</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        format = x.get_format(format_name) sthe format parameters of a given format_name</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        if hasattr(self, format_name) :</span><br /><span style="color:rgb(0, 153, 0);">            return getattr(self, format_name)</span><br /><span style="color:rgb(0, 153, 0);">        raise ParserError("user format %s is not defined" % format_name)</span></p>
<p><span style="color:rgb(0, 153, 0);">    def set_format(self, format_name):</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        x.set_format() set the attribute value to to the value describe by a given format_name</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        try :</span><br /><span style="color:rgb(0, 153, 0);">            format = self.get_format(format_name)</span><br /><span style="color:rgb(0, 153, 0);">        except ParserError, e :</span><br /><span style="color:rgb(0, 153, 0);">            print e</span><br /><span style="color:rgb(0, 153, 0);">            return</span><br /><span style="color:rgb(0, 153, 0);">        for k,v in format.items():</span><br /><span style="color:rgb(0, 153, 0);">            try :</span><br /><span style="color:rgb(0, 153, 0);">                self.set_attr(k,v)</span><br /><span style="color:rgb(0, 153, 0);">            except ParserError, e :</span><br /><span style="color:rgb(0, 153, 0);">                print e </span></p>
<p><span style="color:rgb(0, 153, 0);">    def set_attr(self, k, v):</span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        x.set_attr(k, v) set to v the value of a given attribute k </span><br /><span style="color:rgb(0, 153, 0);">        '''</span><br /><span style="color:rgb(0, 153, 0);">        if k in self.attr_list.keys() :</span><br /><span style="color:rgb(0, 153, 0);">            setattr(self, k, v)</span><br /><span style="color:rgb(0, 153, 0);">        else :</span><br /><span style="color:rgb(0, 153, 0);">            raise ParserError("option %s doesn't exist" % k)</span></p>
<p></code><br />Au passage la classe ParserError a été créée<br /><code><br /><span style="color:rgb(0, 153, 0);">class ParserError(Exception):</span><br /><span style="color:rgb(0, 153, 0);">    def __init__(self, msg):</span><br /><span style="color:rgb(0, 153, 0);">        self.msg = msg</span></p>
<p><span style="color:rgb(0, 153, 0);">    def __str__(self):</span><br /><span style="color:rgb(0, 153, 0);">        return self.msg</span><br /></code><br />avec cette classe Parser, je vais maintenant definir dans Item_list une méthode parsant les fichier en fonction de ces paramètres. le plus difficile va etre la gestion des meta-données. je pense créé un attribut metadata sous la forme d'un dictionnaire. le probleme du fichier gpr est la présence de 2 ligne en tête de fichier sans marker de commentaire. mon idée était de lire le fichier en cherchant des commentaires ou des meta-données et une fois tout ceux-ci trouvé, considerer les autres ligne comme des data.<br />1ere solution : pour eliminer les 2 lignes en tete du fichier gpr, je vais ajouter un attribut 'start_line' a valeur de 3 pour un fichier gpr, 1 pour csv et 1 par defaut<br />2eme solution : la recherche des différents élement doit se faire par présence ou absence des autres éléments.<br /><span style="color:rgb(255, 0, 0);font-weight:bold;">18:27:</span> le parser est finis mais je ne suis pas totalement satisfait de la fonction de recherche des élements. elle se base juste sur la présence des marqueurs dans la ligne. dans le cas de mélange de ces marqueurs, on risque des bugs. il faudra surement retravailler ce parser pour qu'il soit un peu plus générique</p>
]]></content:encoded>
</item>

</channel>
</rss>
