Your free TOP 25 download source!

Featured Software

 IconPop-Up Dictionary
Helps to translate and memorize words of many foreign languages

Wurzgez is a simple yet challenging word guessing game for all ages.

 IconComputing Dictionary
Look up over 7000 highly technical computing terms in English and in Russian.

 IconRecite Russian Words
It is developed for people who are increase Russian vocabulary.

 IconReadWrite Korean
ReadWrite Korean - Learn the Korean alphabet (HanGul).

Carabao Language Kit

A customizable language construction framework
Publisher: Digital Sonata Pty Ltd
Category: Language
License: freeware
Cost: 0$
Size: 115.14 MB
Updated: 17 Aug 2009
Carabao is a family of multipurpose linguistic tools. It provides the following capabilities:

* Sense disambiguation
* Detailed, sentence by sentence domain extraction
* Deep morphological analysis and synthesis
* Automatic linguistic profiling
* Idiom extraction
* Universal measure conversion
* Transliteration between scripts
* Machine readability evaluation of texts
* Automatic translation between languages

The most distinctive feature of Carabao is its complete abstraction from the linguistic point of view. All the linguistic logic resides in a database complete with a powerful GUI data editor. By removing the linguistic logic from the source code, a few goals are achieved:

* Separation of tasks between software developers and linguists
* Faster and more reliable development of new linguistic engines which does not require participation of IT people
* Ease of programmatic use and customization

Version: May 2010)
* Handling of control priority greater than 2, when some of the members have no feasible agreement graph. The result was, that some parts of the sequence worked, and some didn't.
* Truncation of very long sentences

* A utility to validate and correct rule unit values
* A generic support for formatted processing, e.g. HTML, XML, SGML including embedded formatting elements in the text flow
* Automatic conversion of double-byte space characters into standard single-byte

* Regular expressions for segmentation into character classes for double-byte languages
* Perl-compatible regular expressions have been introduced for unknown heuristics
* Frequency-based backtracking added to the tokenization algorithm
Version: Feb 2009)
* Regression: "phantom capitalization" of re-used words
* Regression: sequence style forcing / avoiding
* Repositioning errors in sentences with attached tokens
* Sequence processing in languages not using white spaces

* Lattice-based processing for speech recognition an
Version: Dec 2008)
* Handling of single quotes as syntax delimiters in English

* A segmentation mode more effectively handling languages that don't use white spaces (e.g. Chinese, Japanese, Korean, Thai). In this mode, different character classes are broken into tokens (e.g. Chinese, and t
Version: Sep 2008)
* Unknown patterns were translated as hypernyms
* Regression: certain category-based sequences were omitted on second execution because of a malfunctioning guess scan caching mechanism
* In analytical mode (Carabao DeepAnalyzer), there was a mismatch between word index number and an idiom member index,
in sentences with attached tokens such as 'em, 'm
* When copying a token with 1 rule units or less, the text is always reset to the original

* Capability to match numbers as patterns
* When a translation is not found, the engine tries to fall back to a matching hypernym instead
* New methods to Carabao DeepAnalyzer that enable accessing the members of the detected idioms
* New methods to Carabao CDA that enable accessing the unknown heuristics table
* New sequences
* Russian morphological exceptions

* If an "unknown pattern" is forced to match a known word, it will not create a new guess if a guess with a same hypernym already exists.
For example, if you force to check, whether a known word can be a city, a new record will not be created, if there is already a guess with a known city
* Automatic input language switching in locator fields
* Locator fields are pre-filled with the list of all existing languages in the database, eliminating the need to jump to the next language
Version: Mar 2008)
* Crash when using sequence extraction option (regression from

* Capability to import sequences by data entry directly from the Sequence Sheet
* Capability to manually set sequence descriptions
* Some sequences

* Processing speed and memory consumption - further boost
* Token GUI

* Volatility of newly assigned rule units in late sequences
* Inconsistencies in the generation of inflected forms in design time

* All (or nearly all) the Russian morphological exceptions - over a 1,000 of new prefixes
* Friendly GUI of meta-rules such as lemmatized forms and generation of inflected forms
* MorphoLogic now inspects the design time data generation meta-rules when generating inflected forms

* Processing speed and memory consumption
* Increased maximum length of the meta-rule content field
* Increased some fields to accommodate large sequences and a lot of grammatical data

* Various tagging problems
* A bug with mid-sentence sequences priority setting

* A button to tag new entries morphologically
* A handful of commonly used business entities (e.g., address, phone, fax, business hours)

* Accuracy of sequences
* Domains

* Inflection generation problems of TagLemma results (words not in the dictionary) in Carabao MorphoLogic

* Capability to inspect other guesses. For example, in a sequence like "adverb" + "adverb", it is possible to quickly scrap the entire sequenec if the second adverb can be a preposition
* Comprehensive morphology of Russian language

* Removed description of negative constraint elements (those that do not have an identity) in sequence in order to make the descriptions less cluttered
* Performance of sequence processing
* Accuracy of sequences
* Domains reviewed

* Various validation problems with attached tokens
* Lookup windows are no longer maximized on opening
* Incorrect tooltips after deletion in the dictionary table

* GUI support for negative constraints in sequences
* Handling of irregular 'smart quotes' in Translation Console
* Manual disambiguation table in Carabao Linguist Edition
* Style tags to the tooltips in the dictionary table

* Supplied sequences
* In the translation console, the original thesaurus article is suppressed when the word is part of an idiom - to prevent confusion

Carabao Language Kit has been released

download (carabaoFree.exe - 115.14 MB)