EAND-LPRM: Enhanced Attention Network and Decoding for Efficient License Plate Recognition under Complex Conditions (2024)

1. Introduction

With the rapid advancement of computer technology, deep learning has achieved remarkable progress, finding extensive applications across diverse fields and significantly augmenting the efficiency of related domains. Particularly in the realm of image recognition, the integration of deep learning models has substantially enhanced the comprehension and capability of processing images, thereby streamlining image recognition tasks [1]. In the field of license plate recognition, researchers have gradually begun to focus on the integration of deep learning technology.

In-depth research on the integration of deep learning with end-to-end approaches has become a focus of research. End-to-end license plate recognition methods can automatically extract relevant features from input license plate images, reducing manual intervention. This not only helps address challenges associated with slow recognition speeds but also overcomes the complexity of algorithmic processes in traditional methods. In order to improve recognition accuracy, scholars in this field have conducted comprehensive research work. Ke et al. [2] proposed a two-stage Automatic License Plate Recognition (ALPR) framework. In order to improve the reasoning speed, the correction of license plate images and recurrent neural network, which is difficult to calculate in parallel, are abandoned, and Multi-scale Recognition Network (MRNet), a lightweight recognition network based on multi-scale features, is used for license plate recognition of video images, which can be carried out when vehicles are driving at a higher speed. Chen et al. [3] propose a period-based queue length estimation method for isolated signalized intersections based on License Plate Recognition (LPR) data. An improved interpolation method is used to infer the travel time of unmatched vehicles, and the complete arrival and departure information is processed. The three characteristic parameters of the license plates are used as input to the maximum probability function for estimating the queue length on each lane. Thus, the efficiency and accuracy of the detector to locate the license plate are improved. Ramajo-Ballester et al. [4] presented an improved vehicle identification approach combining dual license plate recognition and visual features encoding. They introduced two new datasets: UC3M License Plate (UC3M-LP) for license plate detection and character recognition, and UC3M Vehicle Re-Identification (UC3M-VRI) for vehicle re-identification. Their dual system enhances vehicle recognition robustness against image variability. Performance was validated on public and new datasets using a multi-network architecture, achieving notable results. Liu et al. [5] proposed a model for detecting and recognizing license plates in a single retransmission, and adopted a nonlinear loss function to fit the detection process of license plates. In order to reduce information loss, they added ground-truth intersection to Intersection over Union (IoU) to obtain balanced IoU loss. The combination of these two loss functions can make the model obtain better prediction results and improve the accuracy of license plate detection.

License plate recognition in complex scenarios plays a pivotal role in practical traffic applications, demanding heightened attention. While enhancing the accuracy of license plate recognition remains a fundamental objective, real-world applications frequently involve intricate situations. Consequently, it becomes imperative to augment license plate recognition in such complex environments. In order to improve the recognition accuracy of license plate images in complex realistic scenes, Sultan et al. [6] proposed the method of Faster Region-based Convolutional Neural Network (RCNN) combined with morphological operation to locate the LP region in the detected vehicle to solve the problems of irregular contours, angle change, and occlusion of the license plate, which can meet the requirements of real-time license plate recognition in realistic scenes. In order to reduce the influence of external factors on license plate recognition, Kim et al. [7] present Adaptive Feature Attention Network (AFA-Net), an adaptive feature attention network for license plate recognition (LPR) that addresses low-resolution and motion blur issues in dash cam images, significantly improving LPR performance. They also introduce Joint-Image Restoration and License Plate Recognition Network (Joint-IRLPRNet), which further enhances performance by simultaneously restoring and recognizing license plates. In order to improve the license plate recognition rate in special scenarios, Anmol Pattanaik et al. [8] proposed a generative adversary network combining image super resolution and the deblur method with the discrete cosine transform discriminator. The Discrete Cosine Transform (DCT) with low computational complexity is used to remove various types of ambiguity and complexity from the license plate, and achieve better recognition of license plate images in a complex environment.

The research mentioned above highlights that, under ideal conditions and specific circ*mstances, license plate recognition demonstrates commendable accuracy. However, existing methods falter when confronted with multiple complex real-world traffic scenarios, resulting in subpar license plate recognition performance [9]. For instance, factors such as deteriorating image quality due to weather changes, uneven license plate brightness caused by lighting conditions, poor dataset quality, blurriness in license plate quality due to high vehicle speeds, and license plate deformation caused by varying capture angles present considerable challenges to both the robustness and accuracy of recognition processes [10]. To solve the above problems, we propose the enhanced attention network and decoding for license plate recognition model (EAND-LPRM). This model specifically targets the difficulties presented by complex traffic scenarios by leveraging an encoder to extract deep features from image sequences and employing a self-attention mechanism to focus on critical feature information. This adaptation is essential to tackle the complexities of real-world traffic scenarios effectively, demanding continuous refinement and enhancement in the field.

Regarding the issues mentioned above, this paper makes significant contributions in two main aspects.

We integrated various publicly available datasets and meticulously curated a diverse collection of license plate images encompassing a wide range of scenarios and categories. The selection criteria for these datasets were based on their ability to provide a comprehensive representation of real-world conditions. Specifically, our criteria focused on encompassing different weather conditions such as heavy rain and dense fog, which result in low visibility, as well as addressing challenges such as license plate deformation due to varying capture angles and obstruction of plate information due to occlusions. Additionally, we included datasets capturing images under challenging lighting conditions to reflect scenarios with poor visibility caused by inadequate illumination. This comprehensive approach better reflects the complex situations encountered in traffic scenes and represents real-world traffic scenarios more accurately. By incorporating datasets that specifically cater to these conditions, we aimed to equip our model with a stronger ability to cope with special traffic scenarios. This strategy enhances the robustness and adaptability of our model, ensuring that it performs optimally across a wide range of challenging conditions commonly encountered in practical applications.
We proposed the EAND-LPRM (enhanced attention network and decoding for license plate recognition model) network model, showcasing advanced feature representation capabilities. This model excels in concurrently addressing license plate recognition tasks across various intricate scenarios and challenging weather conditions, thereby optimizing the recognition process.

In the subsequent sections, we provide a detailed overview of our methodology and experimental setup. Section 2 of the paper details the methodology employed in the proposed EAND-LPRM. Section 3 presents a detailed examination of the architecture and functionality of the EAND-LPRM network model, with a particular focus on its advanced feature representation capabilities designed for complex license plate recognition tasks. Additionally, Section 4 discusses the pivotal role of datasets in model training and presents our meticulously curated dataset, the Complex Parking Dataset (CMPD). Section 5 delves into the evaluation of the proposed EAND-LPRM model, presenting results, implementation details, and comparisons with other license plate recognition approaches. Following this, Section 6 offers a comprehensive discussion of the findings and implications of the EAND-LPRM model in the realm of license plate recognition. Lastly, Section 7, the concluding part of the paper, summarizes the key findings and implications of the proposed EAND-LPRM model for license plate recognition tasks.

2. Materials and Methods

The proposed EAND-LPRM method is organized into modules comprising a self-attention mechanism, residual connection method, recurrent neural network, and connectionist temporal classification (CTC) method. The training process is conducted using a convolutional neural network to extract deep features. We employ hyper-parameter settings including a learning rate of 0.001, batch size of 256, and an Adam optimizer for efficient convergence. The extracted features are then passed to the next module. This module uses a self-attention mechanism to transform the extracted features into a sequence that can be decoded. The self-attention mechanism ensures that critical feature information is focused on, enhancing the model’s ability to handle complex traffic scenarios. The transformed sequence is fed into the labeling prediction module, which uses a sequence-to-sequence model to predict the characters of the license plate. This module ensures accurate and efficient recognition of the license plate information.

2.1. Self-Attention Mechanism

The self-attention mechanism is designed to discern relationships among characters located at various positions within a license plate character sequence, thereby enhancing comprehension of dependencies among elements within the sequence [11]. The flexibility and efficiency of this method make it a powerful tool in license plate recognition tasks, especially in dealing with diverse license plate numbers. It proves particularly adept at accurately capturing license plate information within diverse and intricate environments, thereby augmenting recognition accuracy and robustness [12]. In our study, we integrated the self-attention mechanism into the decoder, facilitating improved connections among the extracted information. This integration results in more precise predictions of license plate image details, thereby enhancing the overall performance of the system.

To underscore the importance of maintaining high video quality for effective recognition, we reference studies that focus on video quality assessment databases in license plate recognition. For instance, Kabiraj et al. [13] have demonstrated that high-quality video inputs significantly enhance recognition performance by ensuring clearer and more distinct character sequences. This finding aligns with our emphasis on the self-attention mechanism, which benefits greatly from high-quality input data to function optimally.

2.2. Residual Connection

The residual connection method is founded on the understanding that in deep networks, the input might traverse numerous layers, leading to the vanishing or exploding gradients phenomenon. This occurrence can render the model difficult to train or even result in training failure [14]. However, integrating direct connections between distinct network layers enables the input to bypass intricate network layers, thereby alleviating gradient-related challenges. By merging the output of the direct connection with the output of the final layer, additional relevant information is assimilated into the initial output. This process ensures that the residual connection not only enhances the original output but also preserves its essence. Consequently, the introduction of residual connections typically yields improved experimental results, surpassing the baseline accuracy level.

In the realm of license plate recognition, the employment of recurrent neural networks often gives rise to the challenge of gradient vanishing, primarily due to the vast diversity inherent in license plate data. This diversity manifests as variable-length character sequences within license plates. The introduction of Integrating Residual En-Decoding and Feature Attention Mechanisms (REDF-LSTM) by Li et al. [15] represents a significant advancement in deep learning architecture. This novel model seamlessly integrates residual learning encoder–decoder LSTM layers, enhanced LSTM layers, and feedforward attention mechanisms. Through rigorous experimentation and comparison with state-of-the-art models, REDF-LSTM has demonstrated superior predictive performance across key indicators. This underscores the efficacy of integrating residual connections in deep learning architectures to address gradient-related challenges and enhance predictive accuracy across diverse domains. In license plate recognition tasks, where deep networks often encounter gradient-related issues due to the traversal of numerous layers, integrating residual connections provides a robust solution. By leveraging residual network connections, the EAND-LPRM model can effectively capture intricate features and accurately discern influential factors. This capability results in substantial enhancements in predictive accuracy, highlighting the significance of residual connections in tackling challenges inherent in deep networks.

Given the intricate nature of license plate recognition tasks, the integration of residual connections proves instrumental in mitigating the challenges encountered during training [16]. Suppose the input x denotes the output from the preceding layer following the rectified linear unit activation function. The mapping function F(x) comprises two 3 × 3 convolutional layers. The resulting output H(x) after the implementation of the residual connection can be mathematically represented as follows:

$H (x) = F (x) + x,$

(1)

From the formula, it is evident that the output, following the introduction of residuals, is influenced not only by the convolutional transformations but also by the original input data, denoted as x. This characteristic aids in mitigating the issues of gradient vanishing and exploding that may occur during the backpropagation process.

2.3. Recurrent Neural Network

The recurrent neural network (RNN) method is employed to systematically capture the sequential dependencies among license plate characters or digits. It operates by utilizing the hidden state of the preceding input vector to retain information from the previous time step’s license plate characters. Alongside the input vector of the current time step, it forecasts the succeeding license plate characters [17]. By employing the RNN method for character recognition in license plates, weight parameters pertinent to character information can be shared, thereby enhancing the connectivity of features between characters. Moreover, this method allows for automatic adaptation to license plates of varying lengths during recognition, thereby enhancing the model’s overall generalization performance [18]. Hence, the retention of information from the previous time step through the hidden state is pivotal. The formula governing the update of the hidden state across different time steps is as follows:

$h_{t} = f (W_{x h} * x_{t} + W_{h h} * h_{(t - 1)}),$

(2)

In the given formula, h_t represents the hidden state at the current time step, f denotes the activation function, W_xh and W_hh are weight matrices, x_t represents the input vector, and h_(t−1) is the hidden state from the previous time step. By integrating the hidden states from the previous time step and the current time step with the corresponding weights and subsequently applying the rectified linear unit (ReLU) activation function, the prediction for the current time step is generated.

$y_{t} = g (W_{h y} * h_{t}),$

(3)

In the given formula, y_t represents the output at the current time step, g denotes the output activation function, W_hy signifies the weight matrix, and h_t represents the hidden state at the current time step. Integrating the recurrent neural network with the encoder proves instrumental in capturing a more exhaustive array of character information from the license plate image. This integration enriches the decoder with supplementary character details and features, thereby enhancing its ability to process and interpret the data effectively.

2.4. Connecting Temporal Classification

The connection temporal classification (CTC) method, proposed within the realm of deep learning, serves as a valuable tool for classifying and labeling sequence data. Specifically designed to handle sequences of diverse lengths, CTC plays a pivotal role in mapping input sequences to output sequences, allowing for the existence of unaligned states between the input and output data [19]. Consequently, CTC offers substantial advantages in addressing sequence-labeling challenges associated with varying sequence lengths. In the domain of recognizing license plate characters, the issue of diverse license plate character sequences arises due to the varying lengths inherent to different license plate types. This discrepancy in sequence length creates a non-uniform alignment relationship between the input and output data. CTC tackles this challenge by enabling each label in the output license plate sequence to correspond to multiple time steps in the input license plate sequence. In this way, it accommodates the inherent variability in sequence lengths observed across different license plate types and character counts.

CTC is instrumental in handling ambiguous alignments. By employing the CTC loss function, a license plate recognition model can be trained, enabling the utilization of decoding algorithms to convert the model’s output into the definitive license plate character sequence. This methodology has demonstrated remarkable outcomes in various domains, including license plate recognition [20]. Consider an input license plate sequence, denoted as X, with a length of T, a target output license plate character sequence Y, and a predicted output license plate character length of U. The objective is to identify the most probable license plate character output sequence Y based on the provided input license plate character sequence X. To account for empty characters or gaps between characters, it is essential to establish an intermediate sequence C with a length of T. Sequence C encompasses all potential characters found on license plates, including a designated blank character. The CTC loss function is designed to compute the likelihood probability of the output sequence Y. To achieve this, CTC introduces an intermediate variable L, representing conceivable sequences involving repeated characters and blank characters. A neural network is employed to model the conditional probability P(L|X). Subsequently, P(Y|X) is computed by summing across all feasible label sequences. The CTC formula is expressed as follows:

$P (L | X) = s o f t \max (α (X, L)),$

(4)

3. Modeling

The proposed EAND-LPRM method consists of three components: an encoding feature extraction module, decoding sequence transformation module, and sequence label prediction module. These components are connected in an end-to-end manner to address deformations and disturbances in license plate images while achieving optimal processing speed. The overall architecture is illustrated in Figure 1.

The encoding feature extraction module employs a convolutional neural network comprising 29 independent convolutional layers and residual blocks. The initial convolutional layers of the model are utilized for extracting low-level features from the input license plate images. By gradually adjusting the output channels, we constructed a deeper network architecture aimed at capturing deeper-level information of the license plate characters. To mitigate gradient explosion issues, residual connections were introduced into the encoder. This not only alleviates problems related to gradients but also enhances the depth of the network, enabling it to capture complex deep features within the license plate images. When dealing with complex scenes and license plates, the model learns additional features, thereby improving recognition performance. It is noteworthy that our output channels first pass through 64 layers, and then, through 32 layers, with the specific channel transformation module illustrated in Figure 2. This progressive reduction in the number of channels allows the model to compress information during the feature extraction stage, effectively preserving the important features of the input license plate images. By increasing the number of channels in the later layers, we can flexibly integrate both low-level and high-level features of the license plate characters. This approach enables the model to capture features of different depths, thereby identifying various complex details of license plates. It enhances its flexibility in diverse environments and strengthens its ability to recognize license plate images across multiple complex scenarios.

In the decoding sequence transformation module, we integrate the self-attention mechanism into the decoder architecture to effectively model the sequence dependencies between license plate characters. This helps the model to focus more on the important feature information at each position within the license plate data, enabling it to better understand the contextual relationships between characters, thereby enhancing its ability to capture key information and improve character recognition accuracy. The self-attention mechanism directly replaces global convolution, making the model more lightweight while also increasing its efficiency and speed. This mechanism further enhances the model’s adaptability to various complex environments, particularly demonstrating exceptional capabilities when handling low-quality license plate images.

In the sequence label prediction module, the connectionist temporal classification (CTC) method is employed to process the decoder’s output results, mapping input sequences to output sequences without the need for explicit alignment between input and output sequences. This makes the model more flexible, suitable for input and output sequences of variable lengths, effectively addressing the challenge of varying numbers of license plate characters due to the diversity of license plate types in the dataset. This method is unaffected by duplicate characters and can accurately identify license plates with different character lengths. Therefore, this approach significantly enhances the model’s generalization ability, ensuring precise recognition across various license plate formats and variations.

4. Datasets

In the realm of model training, datasets assume an indispensable role. Models heavily depend on the intricate patterns and distinct characteristics found within datasets to draw inferences and make predictions. The absence of access to a dataset hampers a model’s ability to grasp these features, thereby rendering it incapable of effective training or practical application [21]. Training a model with meticulously curated, high-quality, and properly segmented datasets yields experimental results that are not only objective but also highly persuasive. In the domain of license plate recognition technology, recognized datasets such as CCPD [22] and CRPD [23] are prevalent. While these datasets provide ample data for model training, they are plagued by certain limitations. Typically, these datasets encompass a limited variety of license plate types, possess restricted geographical coverage, feature license plate images of subpar quality, and lack appropriate proportioning for training, validation, and testing sets [24]. Consequently, models trained on such datasets tend to yield results that are susceptible to subjective influences. This leads to reduced persuasiveness and limited practical applicability. Furthermore, these models fail to adequately mirror the complexities posed by various intricate conditions encountered in real-world scenarios [25].

In order to ensure the objectivity of the model in real-world applications after the training phase, we meticulously curated a specialized dataset called the Complex Parking Dataset (CMPD), tailored specifically for real-world traffic scenarios in China. This dataset was meticulously constructed by amalgamating publicly accessible datasets, including the CCPD (Chinese City Parking Dataset), provided by researchers at the University of Science and Technology of China; the CRPD (Chinese Road Plate Dataset), offered by scholars at the University of Electronic Science and Technology of China; and the CBLPRD (China-Balanced- License-Plate-Recognition-Dataset) from various academic sources. Additionally, we integrated self-captured data from diverse locations within Xiamen City. Following data cleaning and labeling, we have crafted a dataset tailored to real-world traffic applications, ensuring its relevance and applicability in practical scenarios, as visually depicted in Figure 3. The dataset was partitioned into training (comprising 7000 images), validation (consisting of 2000 images), and test sets (comprising 1000 images) in a ratio of 7:2:1. This meticulous partitioning method was implemented to provide robust data support for model testing, continuous learning, and precise evaluation of model performance. Our dataset encompasses a comprehensive collection of license plate images originating from all provinces in China. This dataset includes a diverse range of special vehicle license plates, such as those belonging to traditional fuel vehicles, new-energy vehicles, foreign consulates, police cars, instructor vehicles, civil aviation vehicles, as well as plates from Guangdong and Hong Kong, and specialized operation vehicles. To enhance the robustness of our trained models, we have incorporated hybrid license plates composed of both traditional fuel and new-energy vehicle plates. To augment the complexity of the dataset, we introduced noise and low-quality images with various interferences. These additions simulate real-world traffic scenarios where license plates undergo deformation, face adverse weather conditions, and encounter high speeds and uneven lighting, among other challenging recognition situations. By integrating such diverse and challenging instances, our dataset not only broadens the spectrum of license plate types but also elevates the model’s ability to generalize. This enriched dataset significantly contributes to the current landscape of license plate recognition datasets, substantially increasing the sample size and enhancing the model’s overall performance and adaptability.

5. Results

5.1. Evaluation Metrics

To assess the performance of both the baseline model and the EAND-LPRM model, we introduced three evaluation metrics: test accuracy, loss, and time. Firstly, we used test accuracy to determine the model’s performance on the test dataset, which represents the proportion of license plate samples correctly predicted by the model. It is calculated as

$A c c u r a c y = \frac{N u m b e r o f C o r r e c t P r e d i c t i o n s}{T o t a l N u m b e r o f P r e d i c t i o n s},$

(8)

Secondly, we considered the experimental loss metric, which measures the errors occurring during the training process, indicating the degree of difference between the model predictions and the true labels of the license plates. It is typically measured using the cross-entropy loss function, defined as

$L = - \frac{1}{N} \sum_{i = 1}^{N} (y_{i} \log {(p}_{i}) + (1 - y_{i}) \log {(1 - p}_{i})),$

(9)

Lastly, we focused on the time metric, representing the time required for the model to complete a license plate character prediction task, which is particularly crucial for applications requiring real-time processing.

5.2. Implementation Details

These experiments were implemented using the PyTorch (Version 2.0.1) framework, and the proposed method was executed on an Nvidia RTX 3090 GPU. The experiments involved training iterations for 200 epochs on publicly available dataset as well as our proprietary dataset. Even with a relatively small number of iterations, the model demonstrated remarkable results, clearly and objectively proving its superior performance in handling various factors in complex scenarios. This empirical evidence highlights the robustness and applicability of the model in real-world license plate recognition applications.

5.3. Comparison of Experiments

In this section, we conducted experiments on different datasets to compare our proposed method with other license plate recognition approaches, demonstrating the effectiveness of EAND-LPRM. Under the same experimental settings, we compared our method with Convolutional Recurrent Neural Network (CRNN), ResNet18, and ResNet50 using publicly available dataset. The specific experimental results are presented in Table 1. Our method exhibited outstanding performance across all four evaluation metrics, surpassing all other license plate recognition methods. Particularly, when presented with input images containing license plate deformation and motion blur acquired from real-world traffic environments, EAND-LPRM demonstrated the most robust performance in license plate recognition, outperforming all other methods.

Our method exhibited outstanding performance across all evaluation metrics, surpassing other license plate recognition methods. Particularly, when presented with input images containing license plate deformation and motion blur acquired from real-world traffic environments, EAND-LPRM demonstrated the most robust performance in license plate recognition, outperforming all other methods.

To further validate the rationality of the EAND-LPRM model, we compared it with CRNN, ResNet18, and ResNet50 using our self-developed CMPD dataset, which covers diverse license plate types and is properly partitioned. As shown in Table 2, the recognition accuracy of the EAND-LPRM model on our custom dataset consistently outperforms other models. This robust performance once again confirms the ability of the EAND-LPRM model to achieve high recognition accuracy in handling challenging and complex traffic conditions. Under regular traffic conditions, the EAND-LPRM model can effectively replace other models, demonstrating its potential as a powerful choice for license plate recognition applications.

Considering the emphasis on recognition efficiency, we conducted a comprehensive evaluation of the recognition time of various models on different datasets. Through model evaluation metrics such as test accuracy, best accuracy, average accuracy, and training time, we assessed the effectiveness and reliability of the experimental results. As shown in Figure 4, our proposed EAND-LPRM model achieved the highest recognition accuracy on both public dataset and the CPMD dataset.

The x-axis represents the time in seconds, and the y-axis represents the recognition accuracy. Through rigorous experimental validation, our proposed EAND-LPRM model has been empirically substantiated as a feasible and effective solution. By integrating the encoder–decoder structure with self-attention mechanisms, our approach adeptly captures vital features within license plate images, facilitating accurate recognition of character sequences even amidst multiple intricate conditions. Our model possesses superior recognition capabilities and remarkable robustness in license plate character recognition tasks. Consequently, employing our model for license plate recognition in real-world traffic situations can yield outstanding results. This attests to the practical viability and efficacy of our proposed EAND-LPRM model in addressing the challenges posed by varied and challenging license plate recognition scenarios.

6. Discussion

In this paper, we conducted license plate recognition experiments using the EAND-LPRM model on publicly available dataset as well as proprietary dataset generated by our team. Based on the encoder–decoder architecture, the EAND-LPRM model achieved outstanding recognition accuracy of 94% under various complex conditions, significantly improving the accuracy of license plate recognition technology in diverse and complex scenarios. The encoder component of the EAND-LPRM model utilizes a convolutional neural network to extract deep features from the license plate information, enabling automatic sequence alignment. This approach aligns with existing methods that have leveraged CNN for similar tasks. For instance, Jawale et al. [26] proposed a CNN-based method for automatic vehicle license plate recognition under various conditions. Tao et al. [27] employed the YOLOv5-based Plate Detection and License Plate Recognition (YOLOv5-PDLPR) model, which integrates a multi-head attention mechanism for character recognition and a global feature extractor to improve feature completeness. They also utilized a parallel decoder architecture to enhance inference efficiency. While their method showed improved accuracy and robustness in complex scenes, our EAND-LPRM model’s use of a self-attention mechanism within the decoder similarly focuses on critical features but also emphasizes the relative positioning of characters, potentially offering enhanced recognition accuracy in dynamic and adverse conditions. Our findings not only align with the innovations in the current literature but also build upon them by introducing unique elements like the self-attention mechanism, which further refines the model’s capability to handle complex and diverse license plate recognition scenarios.

However, our method introduces a self-attention mechanism within the decoder, which enhances the model’s ability to focus on critical features and capture the relative positions of license plate characters more effectively. In the encoder component, a convolutional neural network was utilized to extract deep features from the license plate information, enabling automatic sequence alignment. The fusion of convolutional neural networks with residual networks effectively mitigated challenges related to gradients. Furthermore, the decoder incorporated a self-attention mechanism, proficiently capturing and memorizing the relative positions of license plate characters. This enhancement bolstered the model’s understanding and recognition capability of license plate characters, resulting in more accurate character predictions and higher recognition accuracy. Therefore, the model demonstrated superior adaptability to the various complexities of license plate recognition tasks, successfully capturing the intricate features of license plate images even under different and complex conditions.

In our study, to support our claim of the superior performance of the EAND-LPRM model, we conducted detailed statistical analyses. By comparing with baseline models (such as CRNN, ResNet18, ResNet50), we used Independent Samples t-test or Analysis of Variance tests to evaluate whether the differences in performance across various evaluation metrics were statistically significant. The results of the statistical analyses indicate that the EAND-LPRM model significantly outperforms other methods across all four evaluation metrics. These results not only demonstrate the high accuracy and robustness of the model but also statistically confirm that the performance improvements are real and not due to chance. This further enhances the credibility and reliability of our research findings.

Our model achieved high accuracy recognition even when faced with a variety of complex traffic conditions, encompassing challenges such as those arising from complex environmental factors, high-speed driving, poor dataset quality, and license plate deformation. The comprehensive approach adopted by our model endows it with robustness to overcome these complexities, marking significant progress in the field of license plate recognition technology.

In the field of license plate recognition, the quantity and quality of datasets are crucial for determining recognition accuracy, especially under exceptional conditions. The popular public dataset CCPD, despite its large sample size, primarily consists of license plate data from Anhui Province. However, this dataset has limitations related to sample diversity and practical applicability. In contrast, our custom dataset covers various types of license plates and has been systematically categorized. This systematic approach enables it to accurately represent the model’s recognition accuracy in real-world traffic scenarios. The advantage of our dataset lies in its ability to effectively simulate the complex traffic conditions encountered in actual traffic environments. This diversity provides valuable data for rigorous testing and analysis, thereby facilitating comprehensive and objective evaluations of experimental models. Therefore, our dataset represents a significant advancement, offering a more representative and practical foundation for evaluating license plate recognition models under various challenging conditions.

Through data analysis, it is evident that experiments conducted on publicly accessible datasets typically yield higher results compared to experiments conducted on custom datasets. This difference indicates that the CMPD dataset accurately reflects the actual recognition accuracy and can depict real traffic situations. The effectiveness of the EAND-LPRM model is apparent in its ability to handle a variety of adverse conditions. Whether it is the more straightforward conditions found in publicly available datasets or the more challenging environments represented in our custom dataset, the model consistently performs at a high level. This robustness underscores the model’s adaptability and reliability in diverse and complex traffic situations, including adverse weather conditions, varying lighting conditions, and different angles of license plate captures. Moreover, the encoder–decoder architecture of our model, combined with the self-attention mechanism, has proven to be particularly effective in recognizing license plates accurately, even when faced with significant distortions or obstructions. This is a notable improvement over traditional methods, as it enables more precise alignment and prediction of license plate characters.

The ability of our model to generalize well across different datasets is a testament to its comprehensive design and the thoroughness of the training process. By utilizing a wide range of data that mirror real-world conditions, we have ensured that the EAND-LPRM model can maintain high performance and accuracy in practical applications. This makes it a valuable tool for modern traffic management systems, contributing to the advancement of urban intelligence and enhancing the efficiency of traffic monitoring and enforcement.

While the EAND-LPRM model has shown impressive results, there are areas for improvement and further research. One notable challenge is the recognition of double-layer license plates. Double-layer license plates are often used in regions with specific regulatory requirements and can contain additional information such as state identifiers, vehicle categories, or special characters. These plates can vary significantly in format, layout, and character arrangement compared to standard single-layer plates. To address these challenges, future work should focus on developing specialized modules or pre-processing steps. For instance, a dedicated pre-processing algorithm could be designed to separate the two layers and normalize their formats before recognition. Additionally, implementing a more sophisticated segmentation algorithm that can dynamically adjust to different plate layouts would be beneficial. Furthermore, incorporating machine learning techniques such as transformers or graph neural networks could enhance the model’s ability to understand the spatial relationships between characters in different layers. These advanced techniques can help in creating a more flexible and robust recognition system capable of handling the complexities of double-layer license plates.

Expanding the dataset to include more diverse and rare scenarios, such as extreme weather conditions or obscure angles, would also provide a more comprehensive evaluation of the model’s performance. Additionally, focusing on video-based recognition and incorporating real-time processing capabilities could extend the practical applications of the model in dynamic traffic environments. By addressing these specific challenges, we aim to improve the EAND-LPRM model’s performance and extend its applicability to a wider range of real-world scenarios.

7. Conclusions

This study proposes the EAND-LPRM model to address the issue of low accuracy in license plate recognition tasks under low-quality datasets and complex scenarios. The proposed network integrates the advantages of convolutional neural networks, self-attention, and connectionist temporal classification. Particularly, the introduction of self-attention mechanism captures global dependencies among characters, thereby enhancing the model’s understanding of license plate character sequences. Connectionist temporal classification, by learning the correspondence between characters, eliminates the need for additional alignment information, allowing the model to directly recognize character sequences of different lengths from images. Additionally, we propose an encoder–decoder architecture that, through multiple convolutional layers and residual blocks, captures more contextual information and global relationships in noisy and blurry images without sacrificing accuracy. Experimental evaluations using the CMPD dataset demonstrate that EAND-LPRM outperforms previous models, proving its effectiveness in various scenarios. This suggests that the EAND-LPRM method provides a better solution for license plate recognition tasks in real-world traffic scenarios, which is significant for the future of license plate recognition technology.

Although the model performs well in license plate recognition tasks in complex traffic environments, it has one drawback: it does not directly recognize double-layer license plates but adopts a strategy of concatenating them before recognition. Therefore, the features extracted by the model may become more complex and inaccurate. To further explore the effectiveness of EAND-LPRM, we plan to extend the transformer model structure to consider the recognition of double-layer license plates in complex scenarios.

Author Contributions

Conceptualization, X.D. and S.C.; methodology, Z.L.; software, S.C.; validation, Q.N., Z.L. and S.C.; formal analysis, X.D.; data curation, S.C.; writing—original draft preparation, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was fund by the Research Program of Xiamen University Technology, grant number XPDKT20029, Natural Science Foundation of Fujian Province under Grant 2023J011427 and Natural Science Foundation of Xiamen under Grant 3502Z20227067.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cao, Y. Investigation of a Convolutional Neural Network-Based Approach for License Plate Detection. J. Optics 2024, 53, 697–703. [Google Scholar] [CrossRef]
Ke, X.; Zeng, G.; Guo, W. An Ultra-Fast Automatic License Plate Recognition Approach for Unconstrained Scenarios. IEEE Trans. Intell. Transp. 2023, 5, 5172–5185. [Google Scholar] [CrossRef]
Chen, Q.; Li, M.; Wang, C.; Liu, X.; Tang, J. Cycle-Based Estimation On Lane-Level Queue Length at Isolated Signalized Intersection Using License Plate Recognition Data. J. Transp. Eng. Part A. Syst. 2023, 149, 04022123. [Google Scholar] [CrossRef]
Ramajo-Ballester, Á.; Moreno, J.M.A.; de la Escalera Hueso, A. Dual License Plate Recognition and Visual Features Encoding for Vehicle Identification. Robot. Auton. Syst. 2024, 172, 104608. [Google Scholar] [CrossRef]
Liu, S.; Xie, Y.; Wu, L.; Song, K.; Gong, K.; Duan, X. A Single-Stage Automatic License Plate Recognition Network with Balanced-Iou Loss. J. Phys. Conf. Ser. 2023, 1, 012039. [Google Scholar] [CrossRef]
Sultan, F.; Khan, K.; Shah, Y.A.; Shahzad, M.; Khan, U.; Mahmood, Z. Towards Automatic License Plate Recognition in Challenging Conditions. Appl. Sci. 2023, 1, 3956. [Google Scholar] [CrossRef]
Kim, D.; Kim, J.; Park, E. Afa-net: Adaptive feature attention network in image deblurring and super-resolution for improving license plate recognition. Comput. Vis. Image Underst. 2024, 238, 103879. [Google Scholar] [CrossRef]
Anmol Pattanaik, R.C.B. Enhancement of License Plate Recognition Performance Using Xception with Mish Activation Function. Multimed. Tools Appl. 2023, 11, 16793–16815. [Google Scholar] [CrossRef]
Gong, Y.; Deng, L.; Tao, S.; Lu, X.; Wu, P.; Xie, Z.; Ma, Z.; Xie, M. Unified Chinese License Plate Detection and Recognition with High Efficiency. J. Vis. Commun. Image R. 2022, 3, 103541. [Google Scholar] [CrossRef]
Pham, T. Effective Deep Neural Networks for License Plate Detection and Recognition. Vis. Comput. 2023, 3, 927–941. [Google Scholar] [CrossRef]
Rajebi, S.; Pedrammehr, S.; Mohajerpoor, R. A License Plate Recognition System with Robustness Against Adverse Environmental Conditions Using Hopfield’S Neural Network. Axioms 2023, 1, 424. [Google Scholar] [CrossRef]
Schirrmacher, F.; Lorch, B.; Maier, A.; Riess, C. Benchmarking Probabilistic Deep Learning Methods for License Plate Recognition. IEEE Trans. Intell. Transp. 2023, 9, 9203–9216. [Google Scholar] [CrossRef]
Kabiraj, A.; Pal, D.; Ganguly, D.; Chatterjee, K.; Roy, S. Number Plate Recognition from Enhanced Super-Resolution Using Generative Adversarial Network. Multimed. Tools Appl. 2023, 82, 13837–13853. [Google Scholar] [CrossRef]
Kothai, G.; Povammal, E.; Amutha, S.; Deepa, V. An Efficient Deep Learning Approach for Automatic License Plate Detection with Novel Feature Extraction. Procedia Comput. Sci. 2024, 235, 2822–2832. [Google Scholar]
Li, X.; Zhang, Z.; Li, Q.; Zhu, J. Enhancing Soil Moisture Forecasting Accuracy with Redf-Lstm: Integrating Residual En-Decoding and Feature Attention Mechanisms. Water 2024, 16, 1376. [Google Scholar] [CrossRef]
Jiang, Y.; Jiang, F.; Luo, H.; Lin, H.; Yao, J.; Liu, J.; Ren, J. An Efficient and Unified Recognition Method for Multiple License Plates in Unconstrained Scenarios. IEEE Trans. Intell. Transp. 2023, 5, 5376–5389. [Google Scholar] [CrossRef]
Türkyılmaz, İ.; Kaçan, K. License Plate Recognition System Using Artificial Neural Networks. ETRI J. 2017, 2, 163–172. [Google Scholar] [CrossRef]
Wang, D.A.; Tian, Y.A.; Geng, W.A.; Zhao, L.A.; Gong, C.A. Lpr-Net: Recognizing Chinese License Plate in Complex Environments. Pattern Recogn. Lett. 2020, 1, 148–156. [Google Scholar] [CrossRef]
Wang, L.; Cao, C.; Zou, B.; Ye, J.; Zhang, J. License Plate Recognition Via Attention Mechanism. Comput. Mater. Contin. 2023, 1, 1801–1814. [Google Scholar] [CrossRef]
Wei, S.; Li, X.; Yao, Y.; Yang, S. A Novel Short-Memory Sequence-Based Model for Variable-Length Reading Recognition of Multi-Type Digital Instruments in Industrial Scenarios. Algorithm 2023, 16, 192. [Google Scholar] [CrossRef]
Xu, H.; Zhou, X.; Li, Z.; Liu, L.; Li, C.; Shi, Y. Eilpr: Toward End-to-End Irregular License Plate Recognition Based on Automatic Perspective Alignment. IEEE Trans. Intell. Transp. 2022, 3, 2586–2595. [Google Scholar] [CrossRef]
Xu, Z.; Yang, W.; Meng, A.; Lu, N.; Huang, H.; Ying, C.; Huang, L. Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 261–277. [Google Scholar]
Yu, H.; Wang, X.; Shao, Y.; Qin, F.; Chen, B.; Gong, S. Research on License Plate Location and Recognition in Complex Environment. J. Real-Time Image Process. 2022, 4, 823–837. [Google Scholar] [CrossRef]
Zhang, W.; Lu, J.; Zhang, J.; Li, X.; Zhao, Q. Research on the Algorithm of License Plate Recognition Based on Mpgan Haze Weather. IEICE Trans. Inf. Syst. 2022, 5, 1085–1093. [Google Scholar] [CrossRef]
Rao, Z.; Yang, D.; Chen, N.; Liu, J. License Plate Recognition System in Unconstrained Scenes Via a New Image Correction Scheme and Improved CRNN. Expert. Syst. Appl. 2024, 243, 122878. [Google Scholar] [CrossRef]
Jawale, M.A.; William, P.; Pawar, A.B.; Marriwala, N. Implementation of Number Plate Detection System for Vehicle Registration Using Iot and Recognition Using CNN. Meas. Sens. 2023, 27, 100761. [Google Scholar] [CrossRef]
Tao, L.; Hong, S.; Lin, Y.; Chen, Y.; He, P.; Tie, Z. A Real-Time License Plate Detection and Recognition Model in Unconstrained Scenarios. Sensors 2024, 24, 2791. [Google Scholar] [CrossRef]

Figure 1. Structure diagram of the EAND-LPRM model.

Figure 2. Channel transformation module.

Figure 3. Self-made dataset of Chinese license plate images with multiple complex conditions.

Figure 4. Model evaluation scatter plot.

Table 1. Comparison of results on the public dataset CCPD.

Model	Best Accuracy (%)	Avg Accuracy (%)	Test Accuracy (%)	Loss
CRNN	89	87	88	0.190
ResNet18	95	93	90	0.064
ResNet50	96	93	91	0.066
EAND-LPRM	98	97	94	0.039

Table 2. Comparison of experimental results on the self-Made dataset CMPD.

Model	Best Accuracy (%)	Avg Accuracy (%)	Test Accuracy (%)	Loss
CRNN	81	72	80	0.231
ResNet18	88	83	89	0.130
ResNet50	88	83	91	0.160
EAND-LPRM	94	89	94	0.111

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).