Business Problem: Our client needed to perform 1.7 million high-density ray-tracing calculations using an in-house application. We ran tests and estimated that on the fastest PC available, this would take approximately two years and nine months. The client needed it done in under six months in order to meet their schedule on a multi-million dollar project. They would have to run such calculations at least once a year for the next several years as part of ongoing support for their project.
Solution: We evaluated the client's assets and noted that they had a large number of highly underutilized computers: the desktops and laptops used for daily work by their employees. These computers were used mostly for reading email, browsing the web and creating documents using Microsoft Office. The average utilization was between 3 and 5 percent.
We implemented a distributed processing (a.k.a. grid processing) application to run the calculations on the desktops and laptops of the client's employees. The application runs on a server that distributes small chunks of data to each PC. The distributed client running on the PC processes the data and returns the results to the server. The data is processed only when the PC is otherwise idle, so it does not interfere with the employee's normal work. Since the PCs (especially the desktops) are left on 24 hours a day, 7 days a week, the distributed client normally gets 14-16 hours of uninterrupted time on each weekday and 24 hours each day of the weekend. This results in approximately 118 hours of processing time per PC per week.
The client was able to reduce the time required to perform the calculations from 2 years and 9 months to 3 weeks, a 98% savings.
Techical Details: The software was designed as a client/server system. The client resides on the employees' desktop and laptop systems and performs the calculations. The server runs on a Windows 2000 Server system, doling out the data for calculations and receiving the output. It manages the individual data points, recycling any that have been "checked out" for too long. This prevents the system from developing "holes" in the data resulting from machine crashes, power outages or machines being removed from the network midstream.
Because the client's network consisted mostly of Microsoft Windows 2000 and Microsoft Windows NT machines, we implemented the client as a service (similar to a daemon). Under Windows NT/2000/XP, a service is a program that is loaded when the machine is booted, even if no user logs in. It will continue to run when users are logged out. This is a significant advantage, because it means that the client can run at all times as long as the machines are left up and running, while users can still log out when not using the system, per the client's security policy.
The server application was similarly implemented as a service to run on Windows 2000 Server.
The communication mechanism used is SMB (i.e. Microsoft file sharing). This mechanism was chosen over other options (e.g. implementing a sockets-based protocol on top of TCP/IP) for two reasons:
- Speed of development. By using SMB, we let the OS worry about managing multiple simultaneous connections, handling errors, etc. and were able to use the Microsoft File API, minimizing development effort.
- Ease of review by client's security department. SMB is a mechanism that is well understood and already approved by the security department. Selecting it allowed our program to "piggyback" on that acceptance.
The application was designed to work only within the intranet of the client, specifically on the LAN at one facility. Had the client needed the application to function over their WAN or the public Internet, other design choices would likely have been made.
The applications were developed with Microsoft Visual C++ 6.0