next_inactive up previous Aspell Devel Docs Copyright (c) 2002 Kevin Atkinson kevina@gnu.org Contents * Notes * Copyright * 1 Style Guidelines * 2 C++ Standard Library * 3 Templates * 4 Error Handling * 5 Source Code Layout * 6 Strings * 7 Smart Pointers * 8 I/O * 9 Config Class * 10 Filter Interface * 11 Data Structures * 12 Mk-Src Script * 13 GNU Free Documentation License * About this document ... Notes This manual is designed for those who which to developer Aspell. It is currently very sketchy. However, it should improve over time. The latest version of this document can be found at http://savannah.gnu.org/download/ aspell/manual/devel/devel.html. The eventual goal is to convert this manual into Texinfo. However, since I do not have the time to learn Texinfo right now, I decided to use something I am already conferable with. Once someone goes through the trouble of converting it into Texinfo I will maintain the Texinfo version. Copyright Copyright (c) 2002 Kevin Atkinson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts. and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". 1 Style Guidelines As far as coding styles go I am really not that picky. The important thing is to stay consistent. However, please what ever you do, do not indent with more than 4 characters as I find indenting with more than that extremely difficult to read as most of the code ends up on the right side of the window. 2 C++ Standard Library The C++ Standard library is not used directly except under very specific circumstances. The string class and the STL is used indirectly though wrapper classes and all I/O is done using the standard C library with light right helper classes to make using C I/O a bit more C++ like. However the new, new[], delete and delete[] operates are used to allocated memory when appropriate. 3 Templates Templates are used in Aspell when there is a clear advantage to doing so. When ever you use templates please use them carefully and try very hard not to create code bloat by generating a lot of unnecessary, and duplicate code. 4 Error Handling Exceptions are not used in Aspell as I find them more trouble than they are worth. Instead an alternate method of error handling is used which is based around the PosibErr class. PosibErr is a special Error handling device that will make sure that an error is properly handled. It is defined in "posib_err.hpp". PosibErr is expected to be used as the return type of the function It will automatically convert to the "normal" return type however if the normal returned type is accessed and there is an "unhandled" error condition it will abort It will also abort if the object is destroyed with an "unhandled" error condition. This includes ignoring the return type of a function returning an error condition. An error condition is handled by simply checking for the presence of an error, calling ignore, or taking ownership of the error. The PosibErr class is used extensively though out Aspell. Please refer to the Aspell source for examples of using PosibErr until better documentation is written. 5 Source Code Layout common/ Common code used by all parts of Aspell lib/ Library code used only by the actual Aspell library data/ Data files used by Aspell modules/ Aspell modules which are eventually meant to be pluggable speller/ default/ Main speller Module. filter/ tokenizer/ auto/ Scripts and data files to automatically generate code used by Aspell interface/ Header files and such that external programs should use when in order to use the Aspell library. cc/ The external "C" interface that programs should be using when they wish to use Aspell. prog/ Actual programs based on the Aspell library. The main "aspell" utility is included here. scripts/ Misc. scripts used by Aspell manual/ examples/ Example programs demonstrating the use of the Aspell library 6 Strings 6.1 String The String class provided the same functionally of the C++ string except for fewer constructors. It also inherits OStream so that you can write to it with the "<<" operator. It is defined in "string.hpp". 6.2 ParmString ParmString is a special string class that is designed to be used as a parameter for a function that is expecting a string. It is defined in "parm_sting.hpp". It will allow either a "const char *" or "String" class to be passed in. It will automatically convert to a "const char *". The string can also be accesses via the "str" method. Usage example: void foo(ParmString s1, ParmString s2) { const char * str0 = s1; unsigned int size0 = s2.size() if (s1 == s2 || s2 == "bar") { ... } } ... String s1 = "..."; foo(s1); const char * s2 = "..."; foo(s2); This class should be used when a string is being passed in as a parameter. It is faster than using "const String SPMamp;" (as that will create an unnecessary temporary when a const char * is passed in), and is less annoying than using "const char *" (as it doesn't require the c_str() method to be used when a String is passed in). 6.3 CharVector A character vector is basically a Vector but it has a few additional methods for dealing with strings which Vector does not provide. It, like String, is also inherits OStream so that you can write to it with the "<<" operator. It is defined in "char_vector.hpp". Use it when ever you need a string which is guaranteed to be in a continuous block of memory which you can write to. 7 Smart Pointers Smart pointers are used extensively in Aspell to avoid simplify memory management tasks and to avoid memory leaks. 7.1 CopyPtr The CopyPtr class makes a deep copy of an object when ever it is copied. The CopyPtr class is defined in "copy_ptr.hpp". This header should be included where ever CopyPtr is used. The complete definition of the object CopyPtr is pointing to does not need to be defined at this point. The implementation is defined in "copy_ptr-t.hpp". The implementation header file should be included at a point in your code where the class CopyPtr is pointing to is completely defined. 7.2 ClonePtr ClonePtr is like copy pointer except the clone() method is used instead of the copy constructor to make copies of an object. If is defined in "clone_ptr.hpp" and implemented in "clone_ptr-t.hpp". 7.3 StackPtr A StackPtr is designed to be used when ever the only pointer to a new object allocated with new is on the stack. It is similar to the standard C++ auto_ptr but the semantics are a bit different. It is defined in "stack_ptr.hpp" unlike CopyPtr of ClonePtr it is defined and implemented in this header file. 7.4 GenericCopyPtr A generalized version of CopyPtr and ClonePtr which the two are based on. It is defined in "generic_copy_ptr.hpp" and implemented in "generic_copy_ptr-t.hpp". 8 I/O Aspell does not use C++ I/O classes and function in any way since they do not provide a way to get at the underlying file number and can often be slower than the highly tuned C I/O functions found in the standard C library. However, some light weight wrapper classes are provided so that standard C I/O can be used in a more C++ like way. 8.1 IStream/OStream These two base classes mimic some of the functionally of the C++ functionally of the corresponding classes. They are defined in "istream.hpp" and "ostream.hpp" respectfully. They are however based on standard C I/O and are not proper C++ streams. 8.2 FStream Defined in "fstream.hpp" 8.3 Standard Streams CIN/COUT/CERR. Defined in "iostream.hpp". 9 Config Class The Config class is used to hold configuration information. It has a set of keys which it will except. Inserting or even trying to look at a key that it does not know will produce an error. It is defined in "common/ config.hpp" 10 Filter Interface 10.1 Overview In Aspell there are 5 types of filters: 1. Decoders which take input in some standard format such as iso8859-1 or UTF-8 and convert it into a string of FilterChars. 2. Decoding filters which manipulates a string of FilterChars by decoding the text is some way such as converting SGML character into its Unicode value. 3. True filters which manipulates a string of FilterChars to make it more suitable for spell checking. These filers generally blank out text which should not be spell checked 4. Encoding filters which manipulates a string of FilterChars by encoding the text is some way such as converting certain Unicode characters to SGML characters. 5. Encoders which take a string of FilterChars and convert into a standard format such as iso8859-1 or UTF-8 Which types of filters are used depends on the situation 1. When decoding words for spell checking: + The decoder to convert from a standard format + The decoding filter to perform high level decoding if necessary + The encoder to convert into an internal format used by the speller module * When checking a document + The decoder to convert from a standard format + The decoding filter to perform high level decoding if necessary + A true filter to filter out parts of the document which should not be spell checked + The encoder to convert into an internal format used by the speller module 1. When encoding words such as those returned for suggestions: + The decoder to convert from the internal format used by the speller module + The encoding filter to perform high level encodings if necessary + The encoder to convert into a standard format A FilterChar is a struct defined in "common/filter_char.hpp" which contains two members, a character, and a width. Its purpose is to keep track of the width of the character in the original format. This is important because when a misspelled word is found the exact location of the word needs to be returned to the application so that it can highlight it for the user. For example if the filters translated this: Mr. foo said "I hate my namme". to this Mr. foo said "I hate my namme". without keeping track of the original width of the characters the application will likely highlight "e my " as the misspelling because the spell checker will return 25 as the offset instead of 30. However with keeping track of the width using FilterChar the spell checker will now that the real position it 30 since the quote is really 6 characters wide. In particular the text will be annotated something like the following: 1111111111111611111111111111161 Mr. foo said "I hate my namme". The standard encoder and decoder filters are defined in "common/ convert.cpp". There should generally not be any need to deal with them so they will not be discussed here. The other three filters, the encoding filter, the true filter, and the decoding filter, are all defined the exact same way; they are inherited from the IndividualFilter class. 10.2 Adding a New Filter To add a new filter create a new file in the modules/filter directory, the file should be a C++ file and end in ".cpp". The file should contain a new filter class inherited from IndividualFilter, a function to return a new filter, and an optional KeyInfo array for adding options to control the behavior of the filter. The file then needs to be added to Makefile.am so that the build system knows about the filter and lib/new_filter.cpp must be modified so that Aspell knows about the filter. 10.3 IndividualFilter class All filters are required to inherit from the IndividualFilter class found in "indiv_filter.hpp". See that file for more details and the other filter modules for examples of how it is used. 10.4 Constructor Function After the class is created a function must to created which will return a new filter allocated with new. The function must have the following prototype: IndividualFilter * new_«filter_name» Filters are defined in groups where each group contains an encoding filter , a true filter, and a decoding filter. Only one of them is required to be defined, however they all need a separate constructor function. 10.5 Config Options A filter group may have any number of options associated with it as long as they all start with the filter name. See the TEX and SGML filter for examples of what to do and "config.hpp" for the definition of the KeyInfo struct. 10.6 Makefile Modifications After the new file is created simply add the file to the "libaspell_filter_standard_la_SOURCES" line in "modules/filter/ Makefile.am" so that the build system knows about it. 10.7 New_filter Modifications Finally modify "lib/new_filter.cpp" so that Aspell knows about the new filter. Follow the example there for the other filter modules. The filter_modules array should only be modified if there your filter has config options. 11 Data Structures When ever possible you should try to use on of the data structures available. If the data structures do not provide enough functionally for your needs you should consider enhancing them rather than written something from scratch. 11.1 Vector The vector class is defined in "vector.hpp" and works the same way as the standard STL vector does except that it doesn't have as many constructors. 11.2 BasicList BasicList is a simple list structure which can either be implemented as a singly or doubly linked list. It is defined in "basic_list.hpp". 11.3 StringMap StringMap is a associative array for strings. You should try to use this when ever possible to avoid code bloat. It is defined in "string_map.hpp" 11.4 Hash Tables Several hash tables are provided when StringMap is not appropriate. These hash tables provide a hash_set, hash_multiset, hash_map and hash_multimap which are very similar to SGI STL's implementation with a few exceptions. It is defined in "hash.hpp" 11.5 BlockSList BlockSList provided a pool of nodes which can be used for singly linked lists. It is defined in "block_slist.hpp". 12 Mk-Src Script A good deal of interface code is automatically generated by the "mk-src.pl" Perl script. I am doing it this way to avoid having to write a lot of relative code for the C++ interface. This should also make adding interface for other languages a lot less tedious and will allow the interface to automatically take advantage of new Aspell functionality as it is made available. The "mk-src.pl" script uses "mk-src.in" as its input. 12.1 mk-src.in The format of mk-src.in is as follows: The following charaters are literals: { } / '\ ' \n = > := (\n)+ := :\ {\n
\n} | <>
:= \n /\n := (