Time-Interval-Guided Event Representation for Scene Understanding
Boxuan Wang, Wenjun Yang, Kunqi Wu, Rui Yang, Jiayue Xie, Huixiang Liu

TL;DR
This paper introduces a new method to improve scene understanding using event cameras, especially in low-light conditions, by converting sparse event data into detailed images.
Contribution
The novel method converts sparse event streams into dense intensity frames without relying on light sources or motion.
Findings
Event cameras generate events even without brightness changes, influenced by noise.
Events tend to occur in pairs, with time intervals correlated to scene light intensity.
The proposed method enables static imaging with event cameras, useful for HDR imaging.
Abstract
The recovery of scenes under extreme lighting conditions is pivotal for effective image analysis and feature detection. Traditional cameras face challenges with low dynamic range and limited spectral response in such scenarios. In this paper, we advocate for the adoption of event cameras to reconstruct static scenes, particularly those in low illumination. We introduce a new method to elucidate the phenomenon where event cameras continue to generate events even in the absence of brightness changes, highlighting the crucial role played by noise in this process. Furthermore, we substantiate that events predominantly occur in pairs and establish a correlation between the time interval of event pairs and the relative light intensity of the scene. A key contribution of our work is the proposal of an innovative method to convert sparse event streams into dense intensity frames without…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11- —R&D Program of Beijing Municipal Education Commission
- —Young Backbone Teacher Support Plan of Beijing Information Science and Technology University
- —Xingguang Fundation of Beijing Information Science and Technology University
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Neural dynamics and brain function
1. Introduction
Recovering scenes under extreme lighting conditions presents a challenge for traditional frame-based cameras, which are often limited in capturing a broad luminance range in real-world scenarios due to their low dynamic ranges [1]. Event cameras, exemplified by the dynamic vision sensor (DVS) [2], offer a novel solution by adopting a distinct imaging mechanism. Instead of measuring the absolute light intensity of a scene to generate images, event cameras respond exclusively to changes in brightness, producing events asynchronously. In this way, event cameras offer several advantages [3], including exceptionally high temporal resolution (in the order of microseconds), a high dynamic range (up to 120 dB), low latency, and low power consumption. Since event cameras exclusively respond to changes in light intensity, they are mainly deployed in dynamic visual fields [4,5,6,7]. Researchers utilize event cameras to replace frame cameras to deal with tasks such as tracking [8,9,10,11], SLAM [12,13], and dynamic obstacle avoidance [14,15,16,17].
In addition to the progress in static scene analysis, recent efforts have focused on dynamic object pose estimation using event cameras. Liu et al. proposed a line-based method that extracts object lines directly from events and estimates poses without known 2D-3D correspondences, followed by continuous tracking via robust event-line alignment [18]. Extending this idea to aerospace applications, Liu et al. further introduced a stereo event-based pose tracking framework for uncooperative spacecraft, combining line reconstruction from stereo event streams with continuous optimization over 6-DOF motion parameters [19]. Complementing these advances, Yu et al. investigated dynamic visual scene decoding from retinal neural spikes, leveraging deep neural networks to reconstruct visual stimuli and assess decoding quality under varying noise and trial conditions, offering insights into visual neural coding and its implications for brain–machine interfaces [20]. These works collectively demonstrate the versatility of event-based sensing and neural decoding in addressing high-speed perception and cognitive reconstruction tasks. Domínguez-Morales et al. [21] designed a real-time neuromorphic stereo vision system with a novel FPGA-based calibration method inspired by human vision. Jiao et al. [22] proposed a comprehensive LiDAR and event camera calibration framework based on automatic checkerboard tracking and globally optimal optimization. Muglikar et al. [23] developed a calibration method for event cameras using neural network-based image reconstruction without requiring active illumination. Zhang [24] introduced a flexible and simple camera calibration technique using a planar pattern observed from multiple orientations.
In this paper, we propose a new method to model noise behavior and introduce a novel method for reconstructing static scenes. Specifically, we establish a correlation between the temporal information of events and the relative light intensity of the scene, facilitating precise reconstruction with the sole requirement of extracting and analyzing event timestamps. To the best of our knowledge, this study is the first to systematically elucidate the role of noise behavior in event triggering in static scenes. It is also the first to demonstrate that the high temporal resolution of event cameras, particularly timestamps, can be leveraged for the recovery of static scenes. Our contributions are summarized as follows:
- We propose a new method termed the “noise-based event triggering mechanism”. This method provides a probabilistic perspective to elucidate the influence of noise behavior on event triggering in static scenes. It also outlines the relationship between the event generation rate and the light intensity of the scene.
- We present the concept of “event pairs” and demonstrate that events predominantly occur in pairs. We establish the relationship between the time interval of event pairs and light intensity. Based on this observation, we propose an innovative method to convert the high temporal resolution of event signals to the relative light intensity of the static scene.
- We developed a practical application based on our method, namely feature detection under low illumination. Our demonstrations indicate that the time-interval-based method outperforms the integration-based method in detail recovery, thereby expanding the potential applications of event cameras in static scenarios.
2. Related Works
This study presents a novel approach to static imaging using event cameras, with the primary objective of addressing issues such as low contrast and texture loss in recovering static scenes under low-illumination conditions. This work falls within the intersection of event-based reconstruction and HDR imaging, so we will review the latest research in the related areas.
Benefiting from the high temporal resolution and dynamic range of event cameras, researchers have attempted to leverage these advantages to tackle demanding visual tasks, potentially replacing conventional frame cameras. Their initial challenge involves addressing the incompatibility between event cameras and 2D image algorithms, thus propelling the advancement of the field of event-based reconstruction. Henri Rebecq et al. [5] introduced a recurrent network that segments the incoming event stream into sequential spatiotemporal windows of events to reconstruct high-frame-rate (HFR) videos. Lin Wang et al. [25] proposed a method based on conditional generative adversarial networks (cGANs), employing stacks of spacetime coordinates of events as input to reconstruct high-dynamic-range (HDR) images and HFR videos. Liyuan Pan et al. [26] proposed an event-based dual integral (EDI) model, which integrates regularization terms to effectively handle image blur challenges and enhance the reconstruction of high-quality videos.
In addition to video reconstruction, there are also related works that concentrate on recovering scene light intensity, akin to our research. Tsuyoshi Takatani et al. [11] introduced a technique for obtaining bispectral difference images utilizing an event camera with temporally modulated illumination, enabling 3D shape reconstruction in water. Zehao Chen et al. [27] suggested utilizing event cameras to capture intensity changes on a pure diffusion sphere and formulated an analytical expression for radiation intensity and event flow, enabling indoor lighting estimation. Richard Shaw et al. [28] devised a multi-modal end-to-end learning-based HDR imaging system, which accomplishes HDR reconstruction by combining high-quality image information from RGB with complementary high frequency and dynamic range information from events. Jin Han et al. [29] proposed a method for recovering scene radiance by analyzing the transient event frequency during the split second of a light being turned on.
In contrast to approaches that attempt to integrate events over a period [11,30] or those dependent on active light sources [27,29], our method only requires recording the output from a brief static exposure of the event camera for a few seconds to accomplish all the necessary preparations.
3. Preliminaries
The event camera generates events when detecting changes in light intensity but also in static scenes without apparent variations. Thomas Finateu et al. [31] pointed out that the output of the event camera includes normal events caused by changes in light intensity and some background activities. Rui Graca et al. [32] defined these activities as junction-leakage leak events and shot noise events and proposed a second-order model to elucidate the relationship between RMS granular noise voltage and photocurrent. Gao et al. further demonstrated this in [33] by establishing mathematical formulas to quantify the relationship between the event generation rate and photon absorption rate in static scenes.
While it is widely acknowledged that voltage fluctuations due to shot noise are the primary cause of event generation in static scenes by event cameras [34,35], the relationship between event generation rate and static scene intensity remains unclear.
4. Time Intervals of Event Pairs
4.1. Noise-Based Event Triggering
To understand why an event camera can produce a stable output even in the absence of any changes in light intensity, we recorded and analyzed several sets of raw data, where we found two phenomena.
The event generation rate in a static scene is closely associated with the scene intensity. As depicted in Figure 1, we utilize various patches within a standard grayscale checker to represent diverse illumination levels [33]. The polyfitted curve delineates the correlation between the event rate and grayscale value.The majority of events in the raw stream appear in pairs, comprising one positive event coupled with one negative event, forming what we define to be an “event pair”. Figure 2 illustrates the proportion of event pairs within a set of event streams, with an average proportion of 73.34%, indicating that pairs constitute the predominant form of events.
While shot noise has been demonstrated to be the main contributor to event camera output in static scenes [32,34], the omission of consideration for other noises and random occurrences that could lead to fluctuations in current or voltage undoubtedly diminishes the robustness of the current noise theory. This is particularly evident in low illumination conditions, signifying lower photocurrent and increased susceptibility. Rahul Sarpeshkar et al. [36] have demonstrated the intrinsic unity of shot noise and thermal noise occurring in the low-power subthreshold region of the operation of an MOS transistor. It is crucial for us to employ a unified theory to explain the behavior of noise in event cameras. As the noise is filtered by the photoreceptor output stage under high light intensities, this paper concentrates on the behavior of noise under low illumination.
We assume that the noise is white and Gaussian [34,37,38], appearing constantly and randomly at each pixel. The occurrence of noise increases the pixel voltage, triggering a positive event when it crosses the ON threshold. Subsequently, the noise dissipates after reaching its intensity peak, causing the pixel voltage to decrease to a low level and triggering a negative event when it crosses the OFF threshold. We define the complete cycle of noise emergence and disappearance as the “noise process” with the duration of this cycle termed as the “noise period”, as illustrated in Figure 3a. Typical relative thresholds for event cameras range from 10% to 40% [2], indicating that as the pixel voltage increases, a higher noise intensity is needed to trigger an event. We define the minimum noise intensity required to trigger an event as the “threshold noise” and any noise surpassing this threshold can trigger events at the pixel.
As the noise intensity follows a Gaussian distribution, the probability of effective noise that can trigger events can be calculated using the following formula.
where is the threshold noise intensity, and is the probability density function of the noise intensity. Consider a scenario where there are two pixels with different values, and their threshold noises are situated at and , as illustrated in Figure 3c. The probability of effective noise for these two are represented by the blue and green regions in the figure, respectively. It is evident that pixels with lower threshold noise have a higher event generation rate, explaining why events are more likely to occur in low-illumination areas.
Based on the above analysis, the event generation rate is positively correlated with the Gaussian integral of the threshold noise. Considering that the noise voltage is inversely related to the photocurrent, as discussed in [32], and the photocurrent is dependent on the light intensity, the event generation rate is consequently negatively correlated with light intensity, following the Gaussian integral curve. The proof is presented in Section 4.3.
4.2. Intensity Reconstruction from Event Pairs
Instead of reconstructing intensities through integration, we leverage the ultra-high temporal resolution offered by event cameras, which provides the triggering time in the microsecond order. Specifically, we establish a correlation between the time intervals of event pairs and light intensity, enhancing the contrast for low-illumination reconstruction and thereby facilitating the building of a more accurate scene intensity map.
Given the close relationship between threshold noise and pixel voltage, which directly mirrors light intensity, it is theoretically feasible to reconstruct scene intensity from noise events, provided that the noise intensity is precisely measured. Accuracy in measurement is crucial during this process, emphasizing the need for a precise metric. Considering that the noise period signifies the duration of the noise process and is consequently positively correlated with noise intensity—owing to the increased time required to reach a higher peak—we utilize the time interval of event pairs that closely align with the numerical value of the noise period to characterize noise intensity. This approach is favored not only for its high accuracy but also for its accessibility, achieved simply by reading the timestamp of an event.
Figure 4 illustrates the reconstruction of scene intensity based on the time interval of event pairs. The event stream, captured by filming a static scene for several seconds, contains event pairs in line with our theoretical description, alongside numerous single events, as depicted in Figure 4a. Consequently, it is essential to introduce a preprocessing step to enhance the ratio of event pairs. Figure 4b presents a schematic diagram of event pairs triggering. For a given pixel, the average time interval of collected event pairs during the specified duration can be calculated using the following formula:
where N represents the total number of event pairs collected during the specified duration. The scene intensity can be reconstructed by normalizing the average time interval of all pixels to the (0, 255) interval, as depicted in Figure 4c.
4.3. Experimental Verification
In this section, we describe experiments aimed at validated our noise-based triggering method and the time-interval-based reconstruction method.
A fundamental premise of the noise-based triggering method is the assumption that the noise is white noise and conforms to a Gaussian distribution. This assumption enables us to infer that the event generation rate is correlated with light intensity, in accordance with the Gaussian integral curve. To validate this, we need to systematically alter the light intensity and record the corresponding event generation rates at different levels of illumination.
For ease of operation and quantitative analysis, we designed a grayscale checker to simulate gradual changes in light intensity, as illustrated in Figure 5a. The static imaging result is presented in Figure 5b. Using the red line as a reference, we calculated the average number of events triggered in each pixel column along the baseline and plotted how the event rate changes with grayscale to explore its relationship with light intensity, as shown in Figure 5c. The curve is obtained through polynomial fitting using the least squares method. To ascertain whether it follows a Gaussian integral, we derive its derivative curve, as illustrated in Figure 5d, where the primary body aligns with a Gaussian distribution.
To validate the efficacy of our time-interval-based reconstruction method, it is crucial to establish a direct mapping between the time intervals of event pairs and light intensity. We achieve this by calculating the average time interval for each pixel column along the baseline of the grayscale checker. The pixel values are obtained from the RGB camera to ensure the ground truth, and the average time intervals are measured from the pixels at the corresponding locations on the event camera. The mapping from the time interval of event pairs to grayscale is illustrated in Figure 6, revealing a robust positive correlation.
4.4. Time-Interval-Based Method vs. Integration-Based Method
Let us consider a scenario where the noise of the same intensity appears on three pixels with different values, as depicted in Figure 7. Ideally, the reconstruction results for these three pixels should be distinct. However, the tiny difference in pixel voltage results in an indistinguishable triggering outcome—both a positive event and a negative event. This occurrence is frequent in low-illumination conditions, signifying low pixel voltages and consequently a slight disparity in threshold noise intensity. This subtle difference can readily induce false triggering. For the integration-based method, it merely tallies the number of triggered events and fails to extract information for differentiation. Consequently, this approach yields a low-contrast reconstruction.
High Contrast. Previous research [11,27,33] has indicated that intensity information can be reconstructed by integrating events over a period of time. However, this method encounters challenges in reconstructing intricate details, such as the texture of objects in the scene, as depicted in Figure 8b. This is attributed to the low contrast resulting from insufficient information, which can be explained by our method.
The time-interval-based method addresses this issue by leveraging the event camera’s high temporal resolution, which provides triggering times in a microsecond manner. By computing the time interval between event pairs, we achieve a high contrast that enables the distinction of pixels with low differences. This promotes a more accurate reconstruction of the scene intensity map, particularly capturing texture details under low-illumination conditions, as illustrated in Figure 8c.
To comprehensively evaluate our method, we compared the proposed approach with mainstream reconstruction techniques, including traditional integral-based methods and the method by Gao et al. [33]. The qualitative comparison results are presented in Figure 8, while the quantitative evaluation is summarized in Table 1. The quantitative analysis demonstrated that our method achieves superior PSNR scores compared to existing approaches. Furthermore, the qualitative results indicated that our reconstruction preserves finer details, such as textures, more effectively than do the competing methods.
5. HDR Imaging Using Event Cameras
In recent years, high-end HDR imaging technologies have advanced rapidly [39]; however, their widespread adoption remains limited due to cost and accessibility constraints.
We suggest employing event cameras for reconstructing static scenes under low illumination, addressing challenging vision tasks that conventional frame cameras struggle with due to their limited dynamic range. Two main points support this.
Event cameras exhibit a dynamic range exceeding 120 dB, in contrast to consumer frame cameras available at present with 40 dB. This extended range enables event cameras to capture signals under extreme lighting conditions. Theoretically, it is feasible to extract valuable information from the noisy output of event cameras.In contrast to the integration-based method, our approach transforms high temporal resolution into the relative light intensity of the scene. This conversion enhances contrast, enabling us to achieve more precise reconstructions.
We effectively showcased the benefits of our time-interval-based approach through an experiment reconstructing an extremely low-light scene using an event camera. We successfully detected the texture of the table, which remained unseen by an RGB camera or by the integration-based method, as illustrated in Figure 9.
In Figure 9d, the ground truth from the RGB camera captured under normal lighting condition sis presented. Figure 9a displays an image taken by the RGB camera with a 5 s static exposure under low lighting, where the texture is challenging to discern. We reconstructed the scene from an event stream captured during a 5 s static exposures using both the integration-based method (depicted in Figure 9b) and the time-interval-based method (depicted in Figure 9c). The resulting raw frames were then denoised using the BM3D algorithm [40], as illustrated in Figure 9e,f. It can be seen that our method significantly outperformed traditional methods in reconstructing images after denoising.
For a quantitative assessment of the imaging capabilities under low illumination for each method (RGB camera, integration-based method, time-interval-based method),we selected the same pixel area in each image and computed the standard deviation of the pixel values within this area. The results from six independent measurements are depicted in Figure 10. The method closest to the ground truth was the time-interval-based method combined with BM3D denoising, aligning with the observations in Figure 9. Additionally, we calculated the average distance of each method from the ground truth, as presented in Figure 11. The time-interval-based method surpassed both the RGB camera and the integration-based method, highlighting the HDR imaging capability of our method under low illumination.
6. Conclusions
In this paper, we propose a new method to model noise behavior, elucidating its impact on event cameras for generating events in static scenes. Additionally, we introduce the concept of event pairs and establish the connection between the time interval of event pairs and the relative light intensity of the scene. Building on this theoretical foundation, we present a novel method to achieve high-contrast and accurate reconstruction of static scenes. This technology has spawned other applications, such as HDR imaging.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chen X. Liu Y. Zhang Z. Qiao Y. Dong C. Hdrunet: Single image HDR reconstruction with denoising and dequantization Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops Virtual 19–25 June 2021354363
- 2Lichtsteiner P. Posch C. Delbruck T. A 128 × 128 120 d B 15 μs latency asynchronous temporal contrast vision sensor IEEE J. Solid-State Circuits 20084356657610.1109/JSSC.2007.914337 · doi ↗
- 3Gallego G. Delbrück T. Orchard G. Bartolozzi C. Taba B. Censi A. Leutenegger S. Davison A.J. Conradt J. Daniilidis K. Event-based vision: A survey IEEE Trans. Pattern Anal. Mach. Intell.20204415418010.1109/TPAMI.2020.300841332750812 · doi ↗ · pubmed ↗
- 4Kim H. Handa A. Benosman R. Ieng S.-H. Davison A.J. Simultaneous mosaicing and tracking with an event camera IEEE J. Solid-State Circuits 200843566576
- 5Rebecq H. Ranftl R. Koltun V. Scaramuzza D. Events-to-video: Bringing modern computer vision to event cameras Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Long Beach, CA, USA 16–20 June 201938573866
- 6Scheerlinck C. Barnes N. Mahony R. Continuous-time intensity estimation using event cameras Proceedings of the Asian Conference on Computer Vision (ACCV)Perth, Australia 2–6 December 2018308324
- 7Zou Y. Zheng Y. Takatani T. Fu Y. Learning to reconstruct high speed and high dynamic range videos from events Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Virtual 19–25 June 202120242033
- 8Barranco F. Fermuller C. Ros E. Real-time clustering and multi-target tracking using event-based sensors Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Madrid, Spain 1–5 October 201857645769
