Multi-object tracking (MOT) is one of the most essential and challenging tasks in computer vision (CV). Unlike object detectors, MOT systems nowadays are more complicated and consist of several neural network models. Thus, the balance between the system performance and the runtime is crucial for online scenarios. While some of the works contribute by adding more modules to achieve improvements, we propose a pruned model by leveraging the state-of-the-art Transformer backbone model. Our model saves up to 62% FLOPS compared with other Transformer-based models and almost as twice as fast as them. The results of the proposed model are still competitive among the state-of-the-art methods. Moreover, we will open-source our modified Transformer backbone model for general CV tasks as well as the MOT system.