The Montgomery algorithm is a fast modular multiplication method frequently used in cryptographic applications. This paper investigates the digit-serial implementations of the Montgomery algorithm for large integers. A detailed analysis is given and a tight upper bound is presented for the intermediate results obtained during the digit-serial computation. Based on this analysis, an efficient digit-serial Montgomery modular multiplier architecture using carry save adders is proposed and its complexity is presented. In this architecture, pipelined carry select adders are used to perform two final tasks: adding carry save vectors representing the modular product and subtracting the modulus from this addition, if further reduction is needed. The proposed architecture can be designed for any digit size delta and modulus theta. This paper also presents logic formulas for the bits of the precomputation -theta(-1) mod 2(delta) used in the Montgomery algorithm for delta <= 8. Finally, evaluation of the proposed architecture on Virtex 7 FPGAs is presented.