Research - Display Algorithms and Frame Buffer Techniques

Deep and Fast Approximate Order Independent Transparency

G. Tsopouridis, A. A. Vasilakis, I. Fudos, Computer Graphics Forum 43:e15071, 2024. https://doi.org/10.1111/cgf.15071.

Abstract. We present a machine learning approach for efficiently computing order independent transparency (OIT) by deploying a light weight neural network implemented fully on shaders. Our method is fast, requires a small constant amount of memory (depends only on the screen resolution and not on the number of triangles or transparent layers), is more accurate as compared to previous approximate methods, works for every scene without setup and is portable to all platforms running even with commodity GPUs. Our method requires a rendering pass to extract all features that are subsequently used to predict the overall OIT pixel colour with a pre-trained neural network. We provide a comparative experimental evaluation and shader source code of all methods for reproduction of the experiments.

Downloads: official paper link
Neural Moment Transparency

G. Tsopouridis, A. A. Vasilakis, I. Fudos, proc. Eurograhpics (short paper), 2024.

Abstract. We have developed a machine learning approach to efficiently compute per-fragment transmittance, using transmittance composed and accumulated with moment statistics, on a fragment shader. Our approach excels in achieving superior visual accuracy for computing order-independent transparency (OIT) in scenes with high depth complexity when compared to prior art.

Downloads: official paper link
Deep Hybrid Order-Independent Transparency

G. Tsopouridis, I. Fudos, A. A. Vasilakis, The Visual Computer, Proc. CGI 2022, 2022.

Abstract. Correctly compositing transparent fragments is an important and long-standing open problem in real-time computer graphics. Multifragment rendering is considered a key solution to providing high-quality order-independent transparency at interactive frame rates. To achieve that, practical implementations severely constrain the overall memory budget by adopting bounded fragment configurations such as the k-buffer. Relying on an iterative trial-and-error procedure, however, where the value of k is manually configured per case scenario, can inevitably result in bad memory utilization and view-dependent artifacts. To this end, we introduce a novel intelligent k-buffer approach that performs a non-uniform per pixel fragment allocation guided by a deep learning prediction mechanism. A hybrid scheme is further employed to facilitate the approximate blending of non-significant (remaining) fragments and thus contribute to a better overall final color estimation. An experimental evaluation substantiates that our method outperforms previous approaches when evaluating transparency in various high depth-complexity scenes.

Downloads: author-prepared version of the paper

GitHub project: https://github.com/gtsopus/dhoit
A Survey of Multifragment Rendering

A. A. Vasilakis and K. Vardis, G. Papaioannou, Computer Graphics Forum, Proc. Eurographics (STAR), 39(2), 2020.

Abstract. In the past few years, advances in graphics hardware have fuelled an explosion of research and development in the field of interactive and real-time rendering in screen space. Following this trend, a rapidly increasing number of applications rely on multifragment rendering solutions to develop visually convincing graphics applications with dynamic content. The main advantage of these approaches is that they encompass additional rasterised geometry, by retaining more information from the fragment sampling domain, thus augmenting the visibility determination stage. With this survey, we provide an overview of and insight into the extensive, yet active research and respective literature on multifragment rendering. We formally present the multifragment rendering pipeline, clearly identifying the construction strategies, the core image operation categories and their mapping to the respective applications. We describe features and trade-offs for each class of techniques, pointing out GPU optimisations and limitations and provide practical recommendations for choosing an appropriate method for each application. Finally, we offer fruitful context for discussion by outlining some existing problems and challenges as well as by presenting opportunities for impactful future research directions.

Downloads: the paper

Media: Eurographics 2020 presentation
Variable k-Buffer using Importance Maps

A. A. Vasilakis, K. Vardis, G. Papaioannou, K. Moustakas, proc. Eurograhpics (short paper), 2017.

Abstract. Successfully predicting visual attention can significantly improve many aspects of computer graphics and games. Despite the thorough investigation in this area, selective rendering has not addressed so far fragment visibility determination problems. To this end, we present the first "selective multi-fragment rendering" solution that alters the classic k-buffer construction procedure from a fixed k to a variable k per-pixel fragment allocation guided by an importance-driven model. Given a fixed memory budget, the idea is to allocate more fragment layers in parts of the image that need them most or contribute more significantly to the visual result. An importance map, dynamically estimated per frame based on several criteria, is used for the distribution of the fragment layers across the image. We illustrate the effectiveness and quality superiority of our approach in comparison to previous methods when performing order-independent transparency rendering in various, high depth-complexity, scenarios.

Downloads: author-prepared version of the paper shader source code
MSAA-Based Coarse Shading for Power-Efficient Rendering on High Pixel-Density Displays

P. Mavridis, G. Papaioannou, High Performance Graphics poster and quick talk, 2015.

Abstract. Maintaining real-time frame rates at the native resolution of high pixel-density displays is very challenging, especially on power-constrained mobile devices. Decoupled sampling approaches offer a better solution to this problem, compared to rendering at a lower resolution and up-scaling, by sampling the visibility at a higher rate than shading, thus preserving the clarity of geometric edges, while reducing the cost of shading. However, this ability is rather limited in current graphics architectures, where the widely-used MSAA algorithm shades each covered primitive at least once per pixel, without directly providing the ability to compute pixel shading at a more coarse rate. While various extensions of the graphics pipeline for coarse shading have been proposed, in this work we focus on a software implementation for existing GPUs. To this end, we render an intermediate render buffer at a lower pixel count, but at the same time we compensate the loss in resolution by adding the appropriate amount of MSAA sub-pixel samples, in order to guarantee at least one visibility sample per display pixel. Subsequently, a custom resolve shader is used to perform the mapping of sub-pixel MSAA samples to pixels. This simple technique effectively shades more coarsely pixel blocks, where there are no geometric edges. While variations of this idea have been previously used on game consoles, a proper evaluation of the effectiveness of this method at decreasing shader invocations and energy consumption is missing from the bibliography and is our main contribution. We demonstrate our method on several test scenes with varying degrees of geometric and shading complexity and our measurements indicate an up to 45% reduction in energy consumption.

Downloads: one-page abstract poster presentation
k+-buffer: An Efficient, Memory-Friendly and Dynamic k-buffer Framework

A. A. Vasilakis, G. Papaioannou, I. Fudos, IEEE Transactions on Visualization and Computer Graphics. 2015.

Abstract. Depth-sorted fragment determination is fundamental for a host of image-based techniques which simulates complex rendering effects. It is also a challenging task in terms of time and space required when rasterizing scenes with high depth complexity. When low graphics memory requirements are of utmost importance, k-buffer can objectively be considered as the most preferred framework which advantageously ensures the correct depth order on a subset of all generated fragments. Although various alternatives have been introduced to partially or completely alleviate the noticeable quality artifacts produced by the initial k-buffer algorithm in the expense of memory increase or performance downgrade, appropriate tools to automatically and dynamically compute the most suitable value of k are still missing. To this end, we introduce k+-buffer, a fast framework that accurately simulates the behavior of k-buffer in a single rendering pass. Two memory-bounded data structures: (i) the max-array and (ii) the max-heap are developed on the GPU to concurrently maintain the k-foremost fragments per pixel by exploring pixel synchronization and fragment culling. Memory-friendly strategies are further introduced to dynamically (a) lessen the wasteful memory allocation of individual pixels with low depth complexity frequencies, (b) minimize the allocated size of k-buffer according to different application goals and hardware limitations via a straightforward depth histogram analysis and (c) manage local GPU cache with a fixed-memory depth-sorting mechanism. Finally, an extensive experimental evaluation is provided demonstrating the advantages of our work over all prior k-buffer variants in terms of memory usage, performance cost and image quality.

Downloads: author-prepared version of the paper shader source code
Improving k-buffer methods via Occupancy Maps

A. A. Vasilakis, G. Papaioannou, proc. Eurographics Conference (short paper). 2015.

Abstract. In this work, we investigate an efficient approach to treat fragment racing when computing k-nearest fragments. Based on the observation that knowing the depth position of the k-th fragment we can optimally find the k-closest fragments, we introduce a novel fragment culling component by employing occupancy maps.Without any softwareredesign, the proposed scheme can easily be attached at any k-buffer pipeline to efficiently perform early-z culling. Finally, we report on the efficiency, memory space, and robustness of the upgraded k-buffer alternatives providing comprehensive comparison results.

Downloads: author-prepared version of the paper shader source code
Accelerating k+ buffer using efficient fragment culling

A. A. Vasilakis, G. Papaioannou, proc. 19th Symposium on Interactive 3D Graphics and Games (Poster). ACM, San Francisco, California, pp. 129, 2015.

Abstract. In this work, we investigate an efficient approach to treat fragment racing when computing k-nearest fragments. Based on the observation that knowing the depth position of the k-th fragment we can optimally find the k-closest ones, we introduce a novel orderindependent fragment culling component, easily attached to the k+ buffer pipeline. An additional rendering pass of the scene’s geometry is initially employed to construct a per pixel binary fragment occupancy discretization. Then, the nearest depth of the k-th per pixel fragment is concurrently computed by performing bit counting operations and subsequently utilized to perform early-z rejection for the k+ buffer construction process that follows. Any fragment with depth larger than this value will fail the depth test, avoiding the cost of its pixel shading execution. Note that no software modifications are required to the actual k+ buffer implementation.

Downloads: poster summary poster fast-forward presentation
Practical Frame Buffer Compression

P. Mavridis, G. Papaioannou, In GPU Pro 4 (Ed.: W. Engel), CRC Press, 2013.

Abstract. In this article we present a method to directly rasterize a full color image using two color channels, instead of three, thus reducing both the consumed storage space and bandwidth during the rendering process. Exploiting the fact that the human visual system is more sensitive to variations of luminance than chrominance, the rasterizer generates fragments in the YCoCg color space and directly stores the chrominance channels at a lower resolution using a mosaic pattern. When reading from the buffer, a simple and efficient edge-directed reconstruction filter provides a very precise estimation of the original uncompressed values. We demonstrate that the quality loss from our method is negligible, while the memory and bandwidth consumption are greatly reduced. Furthermore, the reduction of the output channels results in a sizable increase in the fill-rate of the GPU rasterizer. Our method is trivial to implement, it is compatible with hardware multi-sample anti-aliasing and alpha blending and can be used with both forward and deferred rendering pipelines. Forward pipelines can benefit from the increased fill-rate, while deferred pipelines can use our method to pack more data on a limited number of render targets, something very important on existing consoles. Bandwidth savings are also extremely important on mobile platforms, where memory accesses will drain the battery.

Official book site
The Compact YCoCg Frame Buffer

P. Mavridis, G. Papaioannou, Journal of Computer Graphics Techniques, 1(1), 19-35, 2012.

Abstract. In this article we present a lossy frame-buffer compression format, suitable for existing commodity GPUs and APIs. Our compression scheme allows a full-color image to be directly rasterized using only two color channels at each pixel, instead of three, thus reducing both the consumed storage space and bandwidth during the rendering process. Exploiting the fact that the human visual system is more sensitive to fine spatial variations of luminance than of chrominance, the rasterizer generates fragments in the YCoCg color space and directly stores the chrominance channels at a lower resolution using a mosaic pattern. When reading from the buffer, a simple and efficient edge-directed reconstruction filter provides a very precise estimation of the original uncompressed values. We demonstrate that the quality loss from our method is negligible, while the bandwidth reduction results in a sizable increase in the fill rate of the GPU rasterizer.

Downloads: paper video WebGL demo

Reference: BibTex
Exploiting Multiresolution Models to Accelerate Ray Tracing

E. A. Karabassi, G. Papaioannou, C. Fretzagias and T. Theoharis, Computers & Graphics, Elsevier, 27(1), pp. 91-98, 2003.

Abstract. In this paper, it is shown how multiresolution models can be exploited in order to improve the efficiency of the ray tracing process without significant image deterioration. To this effect a set of criteria are established which determine the level of detail model that should be used for each ray–object intersection. These criteria include the distance from the observer andthe amount of distortion to which a ray has been subjectedbefore hitting the object. The resulting images are indistinguishable to the naked eye from those obtained using the maximum resolution models but require significantly less time to compute. Our methodcan be usedin conjunction with previous ray tracing acceleration methods.

Downloads: the paper

Computer Graphics Group

Research - Display Algorithms and Frame Buffer Techniques

Deep and Fast Approximate Order Independent Transparency

Neural Moment Transparency

Deep Hybrid Order-Independent Transparency

A Survey of Multifragment Rendering

Variable k-Buffer using Importance Maps

MSAA-Based Coarse Shading for Power-Efficient Rendering on High Pixel-Density Displays

k+-buffer: An Efficient, Memory-Friendly and Dynamic k-buffer Framework

Improving k-buffer methods via Occupancy Maps

Accelerating k+ buffer using efficient fragment culling

Practical Frame Buffer Compression

The Compact YCoCg Frame Buffer

Exploiting Multiresolution Models to Accelerate Ray Tracing