|
|
|
|
|
|
|
Software development ForewordThis document is a summary of our believes and experience.We enumerate different methods, structures and algorithms useful for software deployment. We believe that they are not apples and oranges: A design must have a skeleton and this skeleton can be an algorithm, a data structure or come from a requirement analysis. We can build our design from reusable objects or from design patterns.
Though it is not possible to use all methodologies, it is possible to know many of them and to go by their spirit rather than the letter. We designed this page to keep useful links at hand. ScopeWe focus on Web languages, C#, Java, PHP. These languages are interpreted or precompiled and embed rich set of powerful libraries.Business CaseDevelopers should understand
A business case includes:
Virtual machineJava and .NET rely on a Virtual Machine Infrastructure. At precompilation time, the source code is translated in a sort of portable assembler with its own instruction set and stack. The virtual machine emulates the stack and interprets the pseudo code. From a performance point of view it has important consequences:
OptimizationGarbage Collection is efficient for short-lived objects with a small number of references. Therefore the best approach is:
JavaYou can find the Java specification here.This article describes the Java garbage collection. .NETThe .NET VM manages code and implements the Common Language Infrastructure specification (CLI). You can start with this Overview.The Common Language Infrastructure is standardized by ECMA. You can read the specification on CLI partition I to V. This article describes the Garbage collection in CLI. AlgorithmsIn this section we list:
The Introduction to algorithms of Cormen, Leiserson, Rivest and Stein was a great source of inspiration for that section. You can check the Wayne's Little Data Structures and Algorithms Library.SortFintan Culwin's Learn Ada on the Web Algorithms describes sort algorithms.Jon Bentley and Robert Sedgewick present here fast Algorithms for Sorting and Searching Strings. Remember that RDBMS are sort engines To sort elements you can use:Data structuresRegarding data structures, don't try to reinvent the wheel:
Stack and QueuesStacks and queues are dynamic sets in which the element removed from the set by the DELETE operation is prespecified.In a stack, the element deleted from the set is the one most recently inserted:
the stack implements a last-in, first-out or LIFO policy. Examples:
Linked listsA linked list is a data structure in which the order is determined by a next pointer in each object. In a double list, there is also a prev pointer that points the previous element.Use linked lists when:
Because Java and C# support references, you can implement your own linked list.
However we don't recommand it. Rooted treesThere are different sort of trees:
In the simplest case (binary tree) a node contains three pointers,
p pointing the parent,
left pointing the left child and right pointing the right child. When a binary tree is unbalanced the search time may be no better than with a linked list.
Nodes are inserted at a given place in the tree depending on their value. In case of a binary tree each node having a max of two children, the child on the left branch is a lesser value than the parent and the child on the right branch is a greater value than the parent. This means that for searching you dont have to search the whole tree, what we do is starting at the root, if the value we are looking for is less than the root jump to the child on the left branch, if the value is greater than the root jump to the child on the right. From here you do the same type of check, if the value you are looking for is less than the value of the node you are currently at jump to the child on the left branch, if the value you are looking for is greater than the value of the node you are currently at jump to the child on the right branch. B-trees are red-black trees with many children. In a ternary search tree a search compares the current character in the search string with the character at the node. If the search character is less, the search goes to the left child; if the search character is greater, the search goes to the right child. When the search character is equal, though, the search goes to the middle child, and proceeds to the next character in the search string. See the Jon Bentley and Bob Sedgewick article and a C implementation for more information. Trees are useful when you primarily need to retrieve keys in a range.
For instance the Java JDK implementation has a subMap method that returns
a view of the portion of the tree whose keys are in a range and a tailMap method that returns
a view of the portion of the tree whose keys are greater than or equal to a given value.
Hash tablesA Hashtable uses an array of slots. An element with key k is stored in array[h(k)] where h is a hash function. The number of possible key values is much higher than the array size. Therefore two key may hash to the same slot. It is a collision. Typically elements that hash to the same slot are stored in a linked list.Use Hashtables when you primarily access your data with a key.
Hashtables are especially useful for associative tables. HeapsA binomial heap is a set of min-heap binomial trees. In a min-heap binomial tree the key of a node is greater than or equal to the key of its parent. There are also:
OptimizationOptimization problems are problems that can have many solutions and where we search a solution with an optimal value.Dynamic programmingDynamic programming applies to problems with an optimal substructure. In that case:
You can use dynamic programming to compute the
distance between
words or sentences.
Note:
You can also use "near-neighbor searching" on ternary search tree as described in the Jon Bentley and Bob Sedgewick paper that we described above. Greedy algorithmsA greedy algorithm is an algorithm that gobbles up all of its favorites first. The idea behind a greedy algorithm is to perform a single procedure in the recipe over and over again until it can't be done any more and see what kind of results it will produce.An example of greedy algorithm is the Dijkstra's algorithm to find the shortest path from a point in a graph to a destination. It allows finding the shortest paths from a given source to all points in a graph in the same time. You can find here and here descriptions of the Dijkstra algorithm. Another example of Greedy algorithm is Huffman compression. However optimal merge is even simpler to understand. Classic compressionHuffman compression is a three step process:
Alexander Simakov wrote a small and clean implementation
shc.
gzip, zlib and png use the same algorithm of Jean-loup Gailly and Mark Adler. You can find zlib here and gzip here. Winzip, gzip and zlib uses a combination of deflation and Huffman algorithms. Deflation algorithm is a unpattented variation of Lempel Ziv (that is used in compress and gif). It finds duplicated strings in input data. The second occurence of a string is replaced by a pointer to the previous string. Distances are limited to 32K bytes and lengths are limited to 258 bytes. Strings or match lengths are compressed with one Huffman tree, and match distances are compressed with another tree. Compressed data are in blocks and each block has its two Huffman trees. The deflation process depends on being able to identify portions of the input text which are identical to earlier input within a sliding window trailing behind the input currently being processed. gzip and zlib try all possible matches and select the longest.To find matching strings the deflation uses the Karp-Rabin algorithm. To find if a text contains a pattern, the first idea is to compare the string starting at each position of the text with the pattern. The problem with this solution is that character comparisons are expensive. With Karp-Rabin we first compare the hash of the pattern with the hash of the string and we only compare the strings if the hashes are equal. This approach is efficient only because we can update rather than recompute the hash. gzip/zlib compressor uses a chained hash table to find duplicated strings, using this Karp-Rabin hash function that operates on 3-byte sequences. From RFC 1951:At any given point during compression, let XYZ be the next 3 input bytes to be examined (not necessarily all different, of course). First, the compressor examines the hash chain for XYZ. If the chain is empty, the compressor simply writes out X as a literal byte and advances one byte in the input. If the hash chain is not empty, indicating that the sequence XYZ (or, if we are unlucky, some other 3 bytes with the same hash function value) has occurred recently, the compressor compares all strings on the XYZ hash chain with the actual input data sequence starting at the current point, and selects the longest match. gzip and zlib allow specifying if you prefer intensive or fast compression. It only changes the deflation. Here is the code from zlib deflate.c (Jean-loup Gailly): /* Values for max_lazy_match, good_match and max_chain_length, depending on * the desired pack level (0..9). The values given below have been tuned to * exclude worst case performance for pathological files. Better values may be * found for specific files. */ typedef struct config_s { ush good_length; /* reduce lazy search above this match length */ ush max_lazy; /* do not perform lazy search above this match length */ ush nice_length; /* quit search above this match length */ ush max_chain; compress_func func; } config; local const config configuration_table[10] = { /* good lazy nice chain */ /* 0 */ {0, 0, 0, 0, deflate_stored}, /* store only */ /* 1 */ {4, 4, 8, 4, deflate_fast}, /* maximum speed, no lazy matches */ /* 2 */ {4, 5, 16, 8, deflate_fast}, /* 3 */ {4, 6, 32, 32, deflate_fast}, /* 4 */ {4, 4, 16, 16, deflate_slow}, /* lazy matches */ /* 5 */ {8, 16, 32, 32, deflate_slow}, /* 6 */ {8, 16, 128, 128, deflate_slow}, /* 7 */ {8, 32, 128, 256, deflate_slow}, /* 8 */ {32, 128, 258, 1024, deflate_slow}, /* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* maximum compression */ /* Note: the deflate() code requires max_lazy >= MIN_MATCH and max_chain >= 4 * For deflate_fast() (levels <= 3) good is ignored and lazy has a different * meaning. */Lazy means here that a match is finally adopted only if there is no better match when it searchs for a longer match at the next input bytes. Not surprisingly we can see deflate-based compression tools have similar performances - compressed size between 30 and 40% of the original size - whereas pure Huffman compressed size is 60% of the original size. shcodec is fast and has a very small memory requirement (1280 bytes for encoding and only 574 bytes for decoding) because it is a static Huffman codec. For files with large amounts of the same characters (binary zero, blank, ...), Adaptative Huffman coding can compress better. With adaptative coding, both the compresser and decompresser maintain a code tree. Each time they process a character they recompute the code tree either to insert a new code or to maintain a sibling property: each node (except the root) must have a sibling and the nodes must be listed in order of nonincreasing frequency with each node adjacent to its sibling. Static Huffman encoding has another advantage: You can compute a Huffman tree once and use it over and over. The compresser treats data as a continuous stream and replaces bytes by their variable code. You can find an example of tree on Sierra Creative Interpreter. It is good for speed (one-pass compression) but not for size, especially for small files or messages because the unique Huffman tree must accomodate all possible characters (typically 256). If a small message contains only 50 to 60 characters a character can be coded on a smaller length. bzip uses Burrows-Wheeler. You can download it from
Redhat site.
It has the best compression ratio but uses more memory and CPU than gzip. You can download the original paper (of Burrows and Wheeler) named A Block-sorting Lossless Data Compression Algorithm. Mark Nelson wrote an article in Dr Dobb's about Burrows-Wheeler Transform. Arturo San Emeterio Campos wrote another paper. Millau extends WBXML (from WAP) with separation of structure and content.
Because Millau preserves the element structure of XML it allows a browser to skip unknown elements and attributes. Millau also treats data as a continuous stream. There are two interesting documents on Millau, Algorithms and Programming Models for Efficient Representation of XML for Internet Applications from Neel Sundaresan and Reshad Moussa and Millau: an encoding format for efficient representation and exchange of XML over the Web from Neel Sundaresan.We are not aware of available Millau implementations.The only product with sources that we know is XMILL from ATT. The distribution includes a very interesting paper/xmill.ps.gz from Hartmut Liefke and Dan Suciu. XMILL applies three principles to compress data
Other compression techniquesMark Nelson wrote a good introduction (with code examples) to Arithmetic Coding and Data Compression. These papers were published in February 91 issue of Dr. Dobbs Journal. Therefore they are a bit outdated. We have hundred time more memory, CPU and bandwidth than then and less time to spend designing our programs.Arithmetic coding can be seen as an enhancement of Huffman coding. Huffman coding produces the lowest possible output bit count possible for the input stream of symbols, when using fixed length codes. The optimal number of bits to be used for each symbol is the log base 2 of (1/p), where p is the probability of a given character. This is a real number whereas Huffman requires an integer number of bits to encode a symbol. From the second paper of Mark Nelson: "Arithmetic coding provides a way to encode symbols using an optimal number of bits. The number of bits used by each symbol is not necessarily an integer, as would be the case with Huffman coding. In order to use arithmetic coding to compress data, a model for the data is needed. The model needs to be able to do two things to effectively compress data:
Higher is the probability to find a nth character less space we need to represent this character. It is true also with Huffman coding but with arithmetic coding if the probability of a character is 90%, the optimal code size is 0.15 and not 1 bit. If we compute the probability of the nth based on the value of n-1th, n-2th characters, the probability is generally higher. We can distinguish between compression models based on the number of characters used in predictions. With a order-0 model, the probability of each symbol is independent of any previous symbols. With a order-1 model, the probability of a symbol depends on the previous symbol, and so on. There is another introduction by Matt Timmermans, Bijective Arithmetic Encoding with Optimal End Treatment.A bijective compressor has the property that any file can be decompressed and then compressed back to its original form, in addition to the normal property that any file can be compressed and then decompressed back to its original form. This property is interesting for encryption: one area of crypto-analysis is to guess the key using the pattern of the original data. If the data to encrypt exhibits no pattern, it is much harder to guess the key. Matt Timmermans is the author of BICOM. As we have seen, the capability to predict the next byte is important for compression. The most common technique is Prediction by Partial Matching (PPM). In PPM we call context the last few symbols of the input stream and we maintain statistical information about each context in order to predict the next symbol. Jesper Larsson wrote: "For each context, a table of symbol counts is dynamically maintained, and the code applied whenever that context occurs is based on the statistics of this table. When a symbol appears in a context for the first time, its count in that context is zero. Still, it must be possible to encode the symbol in that context, so some amount of code space must be reserved for previously unseen events. Therefore, each context also keeps an escape count, used to encode a new symbol event in that context."These explanations help to understand the different algorithms. Dmitry Shkarin wrote a nice paper. He is also the author or PPMD, which exists in different Vars.Charles Bloom wrote PPMZ. We compared BICOM, PPMD and PPMZ with gzip and bzip2 using this file (now 84 615 bytes).
Codecs are listed in order of speed: bicom is still fast and ppmz is slow but compresses better. Sources are available for all implementations, so you can adapt them to your needs. Bicom allows applying a Rjindael encryption to the compressed stream. Miscellaneous about compressionConclusion
There is a hard problem that we would like to present now,
the prediction of the worse case compression ratio.
Is it possible to predict the worse case compression ratio of a generic algorithm
The compressed size tends to l(i). Therefore the problem is to estimate i using less CPU than we need to compress the message. The ratio message_size / l(i) is meaningless whereas the ratio compressed_size / l(i) is meaningful. Let's consider the application of this technique to networks.
GraphsYou can look at
Review of Elementary Graph Theory.
The degree of a vertex is the number of edges incident on it. A path is a sequence of vertices with their connecting edges. A path forms a cycle if its first vertex is also its last vertex. A graph with no cycle is acyclic. Given a source vertex s Breadth-First Search (BFS) explores the graph edges to discover vertices reachable from s and builds a s breadth-first tree that contains all reachable vertices. For any vertex reachable from s, the path in the breadth-first tree contains the path containing the smallest number of edges. To keep track of progress, breadth-first search colors each vertex white, gray or black. Vertices start out white. A discovered vertex whose all adjacent vertices have been discovered is black. otherwise it is gray. See C++ Boost breadth_first_search for a implementation using (alas!) C++ templates. Breadth-first search crawling yields high-quality pages presents a Web crawling strategy based on BFS.In depth-first search, edges are explored out of the most recently discovered vertex v that still has unexplored edges leaving it. When all of v's edges have been explored, the search backtracks to explore edges leaving the vertex from which v was discovered.As in breadth-first search, vertices are colored during the search. Each vertex is initially white, it is grayed when it is discovered and it is blackened when its adjent nodes have been explored. See C++ Boost depth_first_search for a implementation using C++ templates. The topological sort algorithm creates a ordering of the vertices such that if edge (u,v) appears in the graph, then u comes before v in the ordering. The graph must be a directed acyclic graph (DAG). The implementation consists mainly of a call to depth_first_search().See C++ Boost topological_sort for a implementation using C++ templates. The minimum-spanning-tree problem is defined as follows:
Kruskal's algorithm starts with each vertex in a tree by itself,
and with no edges in the minimum spanning tree T.
The algorithm then examines each edge in the graph in order of increasing edge weight.
If an edge connects two vertices in different
trees the algorithm merges the two trees into a single tree and adds the edge to T. Prim's algorithm operates like Dijkstra shortest path.
The Bellman-Ford algorithm solves the single-source shortest-path problem for a graph with both positive and negative edge weights. If you only need to solve the shortest paths problem for positive edge weights, Dijkstra's algorithm is faster. If all weights equal one, use breadth-first search. The Bellman-Ford algorithm proceeds by looping through all of the edges in the graph, applying the relaxation operation to each edge. In the following pseudo-code, u is a vertex, v is a vertex adjacent to u, w is a map where w[u, v] is the weight of the edge, d is a map where d[u] is the length of the shortest path to u so far, p is a map where p[u] is the parent of u.relax(u, v, w, d, p) { if (w[u, v] + d[u] < d[v]) { d[v] = w[u, v] + d[u]; p[v] = u; } } bool bellman-ford(V, w, [out]p, [out]d) { for each vertex u in V { d[u] = infinity; p[u] = u; } for (int i = 1; i < V.length; ++i) { for each edge(u, v) in w relax(u, v, w, d, p); } for each edge(u, v) in w if (w[u, v] + d[u] < d[v]) return false; // not minimized } return true; } See C++ Boost bellman_ford_shortest_paths for a implementation using C++ templates. If the graph is a direct acyclic graph, we can use:topological_sort(); for each vertex u in V { d[u] = infinity; p[u] = u; } for (int i = 1; i < V.length; ++i) { for each edge(u, v) in w relax(u, v, w, d, p); }Dijkstra's algorithm finds all the shortest paths from the source vertex to every other vertex by iteratively growing the set of vertices S to which it knows the shortest path. At each step of the algorithm, the next vertex added to S is determined by a priority queue. The queue contains the vertices in the graph but not in S prioritized by their distance label, which is the length of the shortest path seen so far for each vertex. The vertex u at the top of the priority queue is then added to S, and each of its out-edges is relaxed: if the distance to u plus the weight of the out-edge (u,v) is less than the distance label for v then the estimated distance for vertex v is reduced. The algorithm then loops back, processing the next vertex at the top of the priority queue. The algorithm finishes when the priority queue is empty. See C++ Boost dijkstra_shortest_paths for a implementation using C++ templates. To solve the all-pair problem, we can use
n = W.length; D(0)=W; for (int k = 0; k < n; ++k) { for (int i = 0; i < n; ++k) { for (int j = 0; j < n; ++k) { d(k)ij= min(d(k-1)ij,d(k-1)ik+ d(k-1)kj); } } } return D(n);There are other presentations from from Mark Dettinger and of the University of Rochester. Johnson's algorithm returns either returns a matrix of shortest-path weights for all pairs of vertices or reports that the input graph contains a negative-weight cycle. First, it adds a new node with zero weight edges from it to all other nodes, and runs the Bellman-Ford algorithm to check for negative weight cycles and find h(v), the least weight of a path from the new node to node v. Next it reweights the edges using the nodes' h(v) values. Finally for each node, it runs Dijkstra's algorithm and stores the computed least weight to other nodes, reweighted using the nodes' h(v) values, as the final weight. See C++ Boost johnson_all_pairs_shortest_paths for a implementation using C++ templates. We can interpret a directed graph as a flow network and use it to answer questions about material flows. In a flow network a source produces the material at some steady rate, and a sink consumes the material at the same rate. An edge in a flow network has a capacity. The network must satisfy two constraints:
f = 0; while there is an augmenting path p augment flow f along p; return f;You can find a C implementation of Ford-Fulkerson method here. Push-relabeled algorithms are often faster than Ford-Fulkerson method. Linear programmingIn the general linear programming problem, we are given an m x n matrix A, a m vector b, and an n vector c. We wish to find a vector x of n elements that maximizes an objective function Sumn i=1c ix isubject to the m constraints given by Ax <= b. The constraints form the feasible region.With n = 2 and m = 3 if A = (4 -1) (2 1) (5 2) c = (1 1) and b = (8 10 6) it is the same as: maximize x1 + x2 subject to 4 x1 - x2 <= 8 2 x1 + x2 <= 10 5 x1 - 2 x2 <= 6 4 x1 - x2 <= 8 is a constraint. First we convert constraints into a slack form. We typically introduce new x
n+i
variables and we write: We focus on the basic solution: set all nonbasic variables to zero and compute the value of the basic variables. x n+i= b i. Next iterations of the simplex algorithm will rewrite the set of equations and the objective function so as to put a different set of variables on the right-hand side. We select a nonbasic variable x j whose coefficient in the objective function is positive, and we increase the value of x j as much as possible without violating any of the constraints. The variable x j becomes basic and the x n+i on the left-hand side of the tightest constraint becomes nonbasic. This operation is a pivot. x j is the entering variable and x n+i is the leaving variable. We rewrite the constraints and the objective function to have only nonbasic variables on the right-hand side. The basic solution objective should improve. If we cannot improve the objective function by a single pivot operation (moving from a basic feasible solution to an neighbouring basic feasible solution) then the current basic feasible solution is optimal. Given a primal linear program as above, we can define a dual linear program as
Minimize Summ
i=1b
iy
i
Robert J. Vanderbei wrote a free implementation for Windows and Unix that you can download here. It features the two-phase and the self-dual methods with the eta-matrix and the refactorization approaches. SoPlex is an implementation of the revised simplex algorithm. It features primal and dual solving routines for linear programs and is implemented as a C++ class library that can be used with other programs. You are allowed to retrieve SoPlex only for research purpose as a member of a non-commercial and academic institution. Bayesian networksRead first A Brief Introduction to Graphical Models and Bayesian Networks of Kevin Patrick Murphy.This page lists Software Packages for Graphical Models / Bayesian Networks. The most important implementations are perhaps: Reinforcement learningReinforcement learning is an extension of Markov Decision Process (MDP). A Markov decision process, M = {S, A, T, R}, is a description of a synchronous control domain specified by four components:
The goal of the agent is to maximize its aggregated reward over time DesignBefore diving into methods and patterns, you can read The Cathedral and the Bazaar and visit the Extreme programming site. OK, it is not the same thing, you can use both extreme programming and a design method... Blabla. If "release early/release often" has more impact on software quality than thorough object analysis why should we spend time learning that stuff? Methods exist for three reasons:
UMLUML is a modeling language, not a method. A modeling language is a notation that design methods use to express designs. Because UML is broadly used it is certainly the best modeling language that we can use. To design a product, we need to apply a process, which is the modeling language advice on what steps to take in doing a design.
Processes are necessarily fuzzy and almost never followed because they assume a phasing.
Iterative construction addresses this issue: in the initial version, developers don't try to implement all requirements. They release early and often. Users see the product and it is cheaper and simpler to take corrective actions. The problem is the elaboration also has to be iterative and to go on in parallel with construction. The management must agree on something that nobody can describe. To address this issue we hierarchise requirements. On a medium size project it is already common to already have 100 or more requirents. Cognitive psychologists have determined that a human is only capable of maintaining and manipulating cognitive models which contain seven plus or minus two, i.e. between five and nine, cognitive chunks of information. Therefore we first identify two to five top level requirements that we describe as objectives. Then we assign second level requirements to first level requirements. A first level requirement can have seven to nine second level requirements. At this step we can see how second level requirements relate to each other. A requirement shouldn't be much harder to satisfy than another. Requirements should also cover well the area of possible features. If it is the case you have good chances that further requirements will continue filling the area of possible features. Otherwise the top level requirement taxonomy probably doesn't represent the need. Then we can make preliminary designs (more than one!) meeting the requirement. Next we select the most capable in term of evolution by answering to the following questions:
During the construction phase, we choose the solution whose impact on the evolution capability is the smallest. Design patternsChristopher Alexander says, "Each pattern describes a problem which occur over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million time over, without doing it the same way twice". We can decompose a design in design problems that a design pattern can solve. A pattern has four elements:
Feature-driven developmentFeature-Driven Development (FDD) is a recent software-development process. The five processes within FDD are:
StoryboardingWhen you have to design a product, identify the key element. If the users will primarily use the product through its graphical interface then the graphical interface is the key element. UML and design patterns don't help to design graphical interfaces. You must first storyboard your graphical interface.The Art of Storyboarding presents storyboarding. You can also read Storyboarding Multimedia and Storyboarding for Design: An Overview of the Process. Storyboarding allows checking usability before spending man * monthes in development. Though storyboarding tends to be less formal than UML, it allows better cost estimates. It also allows identifying components that you don't know how to develop.
Some authors suggest that we should wireframe before. They explain that wireframing defines the "what" of the creative process while storyboarding tackles the "how" but at the same time they define wireframing as identifying the entry and exit point the users experience on every page of the site. Then one goal of storyboards is to identify the pages (the what). We believe that we should - just like in film industry:
Project ManagementMythical Man-MonthIn the Mythical Man-Month (1975- 1995) Frederick P. Brooks wrote: "Cost does indeed vary as the product of the number of men and the number of months. Progress does not. Hence the man-month as a unit for measuring the size of a job is a dangerous and deceptive myth. It implies men and months are interchangeable. "Men and months are interchangeable commodities only when a task can be partitioned among many workers with no communication among them. This is true of reaping wheat or picking cotton; it is not even approximately true of systems programming. "When a task cannot be partitioned because of sequential constraints, the application of more effort has no effect on the schedule. The bearing of a child takes nine months, no matter how many women are assigned. Many software tasks have this same characteristic because of the sequential nature of debugging. "In tasks that can be partitioned but which require communication among the subtasks, the effort of communication must be added to the amount of work to be done. Therefore the best than be done is somewhat poorer than an even trade of men for months... "Since software construction is inherently a systems effort -- an exercise in complex interrelationships -- communication effort is great, and it quickly dominates the decrease in individual task time brought about by partitioning. Adding more men then lengthens, not shortens, the schedule."Following this observation, which was never seriously challenged we can easily conclude that the safest and cheapest project is the project handled by a single person - the project that doesn't require management. A fact frequently forgotten is that a compiler, gcc, the first PC databases and Linux were originally written by a single programmer and in a timeframe matching user needs. Even today an experienced programmer can develop a cheaper, faster, smaller and more reliable product than a small team. This is not say that teams and even big teams are not needed. New products have to offer more functions and to be friendlier than the products they replace. Therefore they are more complex and require bigger teams. The OS/360 project of Frederick P. Brooks, then one of most strategic projects of the biggest computer maker (IBM) had a team of 160 programmers. Now in some companies this would be a medium size project. Therefore project management is more needed than ever.Project breakdownBrooks also found an empirical breakdown of software tasks in 1/3 planning, 1/6 coding, 1/4 component test and early system test, 1/4 system test, all components in hand.We can note that this breakdown follows from the project complexity, not from communication and sequentail tasks: when the project is complex, you experience this breakdown even when you work alone. No Silver bulletIn No Silver Bullet (1986) Brooks wrote: "no single new development in the next ten years would give software developers an order-of-magnitude improvement in productivity, reliability, or simplicity", which also turned to be true. This is not to say that there were no improvements but they happend mostly in coding with better languages and IDEs and in component tests with symbolic debuggers that represent only 30% of the cost.Quite interestingly programming and component debugging are also the tasks that can be effectively taught whereas design and system troubleshooting require experience and remain arts. And what we call experience here is not something that can easily be transfered or modelized. This has more to do with intuition. This is this brain magic that allows us classifying facts and feeling that things shouldn't be done in a given way. ToolsIf projects are not released on time this is not because project management tools (products like Microsoft Project) are bad. Simply these products handle tasks and leave the definition of these tasks to the user appreciation. There are three traps:
These problems are not simple ones. For instance experience comes through a succession of epiphanies rather and not through a peaceful journey. A project manager can trust some people because they already did something similar but she cannot know if an unconfirmed programmer will or won't be able to achieve a complex task. Therefore a project manager should be;
Because of that we believe that the best product management tools are the simplest and most intuitive. Take a look at products like
DELEGATOR from Madrigal, a company that provides
Scheduling Software for managing people and resources with free evaluation versions.
General large scale metrics Cuckoo Ptah Algorithms Graphs Kalman Air Transport Society Device
Contact:support@pagebox.net |