Searching for smells
A few years ago I managed a big software development project for a client from the financial sector. After a few weeks of discussions with the client and waiting for some basic information needed to start we finally began coding. Within another few weeks we had the first version of the software that fulfilled the first few chosen user stories (requirements). Compilation finished without errors and the application started and worked even in the client’s environment, I decided to search the code for “smells” – places where something bad could happen.
Among other things – no first version of the code is free of imperfections – I noticed that in all the places where data were supposed to be read the programmers assumed that the amount of data (e.g. the number of records in a database) is limited and relatively small. None of the functions that might potentially have to process numerous records in a loop was asynchronous, none of them had any limitations regarding the number of records processed at once.
When it comes to asynchronous operations in a web application, careful planning is needed. Planning that takes every detail into account – not only the actual data processing. It might be nice, for example, to indicate that something is actually going on in the background by displaying an hourglass, a progress bar or other similar element. What worried me most, though, was not the lack of the elements of user interface but the fact that the loops in the code had no limitations. Regardless of the type of the loops – be it for, foreach or while – programmers often forget to implement means of controlling the duration and the amount of work they would have to do in the business layer. It is worth noticing that paging controls are built into all frameworks used for displaying views (particularly lists and datagrids) as a standard. And rightly so! What grounds do we have to assume that a table will contain a couple of dozen instead of millions of records?
Possible effects of the lack of loop control
And here we come to the crux of the problem. Be it the case of acquiring data from a database or processing a list in a loop, most programmers tend to forget one simple fact: if no limit of records in a data source is provided, you need to assume you will be processing huge amounts of data. And if the amount of data is huge, the application is bound to either freeze for the duration of performing its task or throw a timeout exception. It is unacceptable for either a client or a server application to work this way!
A practical solution
It might be hard for programmers to change their habits and start doing something in a different way. At Indesys we simply do not allow to read data directly from database. It needs to be done via the business layer that does not allow to download a list of records (e.g. a list of invoices or customers) without providing the maximum number of such records possible for download at a given time and some filter (name, status, whatever). It can be ten, twenty or a thousand records – it does not matter. What matters is that the number needs to be set and provided in the configuration file. Views, as I mentioned, have built-in paging controls that serve a similar purpose.
Of course there are some difficulties connected with processing data fulfilling a certain conditions in chunks instead of doing it in a single go. Yes, creating an application that works this way requires more work on the programmers’ part but it is unavoidable in the case of large databases. Real life requires us to program differently that they teach in the tutorials. Tutorials can show you how to connect to a database, download and process small packages of data. Which is OK if you are doing a Computer Science project for school but not really so when you are creating a service for an international corporation that will process millions of reports.
If you think I am exaggerating and loop control is not needed all that often, I encourage you to read NASA’s 10 rules for critical code safety. Rule number 2 states: All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded. If the loop-bound cannot be proven statically, the rule is considered violated .Sometimes loop length control is a matter of life and death, of risking the destruction of expensive equipment or missing the project deadline and/or going over the budget.
Thanks for reading and regards
CEO, .Net developer, software architect
If you need help with your software project, or need customized software for your company, contact me at: dominik.steinhauf ( at) cys.biz.pl
Other related articles