Inference on a device
In this approach, the machine learning model is loaded into the client mobile application. To make a prediction, the mobile application runs all the inference computations locally on the device, on its own CPU or GPU. It need not communicate to the server for anything related to machine learning.
Speed is the major reason for doing inference directly on a device. We need not send a request over the server and wait for the reply. Things happen almost instantaneously.
Since the model is bundled along with the mobile application, it is not very easy to upgrade the model in one place and reuse it. The mobile application upgrade has to be done. The upgrade push has to be provided to all active users. All this is a big overhead and will consume a lot of effort and time.
Even for small changes, retraining the model with very few additional parameters will involve a complex process of an application upgrade, pushing the upgrade to live users, and maintaining the required infrastructure for the same.
The benefits of using this approach are as follows:
- Users can use the mobile application in offline mode. Availability of the network is not essential to operate the mobile application.
- The prediction and inference can happen very quickly since the model is right there along with the application source code.
- The data required to predict need not be sent over the network and hence no bandwidth cost is involved for users.
- There is no overhead to run and maintain server infrastructure, and multiple servers can be managed for user scalability.
What we need to be careful about when we go for this approach is the following:
- Since the model is included along with the application, it is difficult to make changes to the model. The changes can be done, but to make the changes reach all client applications is a costly process that consumes effort and time.
- The model file, if huge, can increase the size of the application significantly.
- The prediction logic should be written for each OS platform the application supports, say iOS or Android.
- All of the model has to be properly encrypted or obfuscated to make sure it is not hacked by other developers.
In this book, we are going to look into the details of utilizing the SDKs and tools available to perform tasks related to machine learning locally on a mobile device itself.