..............................
..............................
..............................

Neural Volumetric Representations for Real-Time 3D Scene Reconstruction Using Multi-Modal
Deep Learning (DL) is a subfield of Machine Learning (ML) models used in various complex fields. DL algorithms
are mostly widely used to reconstruct 3D images collected from multiple online sources. It is a very challenging task for the
existing algorithms to reconstruct 2D images into 3D pictures without losing high-quality pixels because of the complex scenes
with different lighting situations, dynamic components, and occlusions. This paper presents a novel real-time 3D scene
reconstruction using neural volumetric representations combined with a Multi-Modal Learning Algorithm (MMLA). The
proposed MMLA focuses on solving issues like volumetric representations of scenes, which are improved by combining numerous
modalities such as RGB images, depth sensors, and Inertial Measurement Unit (IMU) data. The MMLA combines the DeepVoxels
model and Neural Radiance Fields (NeRF) model, which it calls the Neural Rendering technique, to learn complex patterns in
3D scenes. The pre-trained model EfficientNet accurately obtained the 3D- reconstruction patterns and understood the spatial
structures that transfer to the proposed MMLA. The proposed MMLA performance is analyzed using the ShapeNet dataset, which
consists of 2D images. Finally, the experimental results show that the proposed MMLA outperforms the superior performance
in terms of Mean Squared Error (MSE) of 0.167, Root Mean Squared Error (RMSE) of 0.50, and Mean Absolute Error (MAE)
of 1.1. These results may differ from other datasets.
[1] Abate D., Themistocleous K., and Hadjimitsis D.,
“The Application of Neural Radiance Fields
(NeRF) in Generating Digital Surface Models
from UAV Imagery,” in Proceedings of the IEEE
International Geoscience and Remote Sensing
Symposium, Athens, pp. 10228-10231, 2024.
https://ieeexplore.ieee.org/document/10641392
[2] Ahmad B., Floor P., Farup I., and Hovde O., “3D
Reconstruction of Gastrointestinal Regions Using
Single-View Methods,” IEEE Access, vol. 11, pp.
61103-61117, 2023.
https://ieeexplore.ieee.org/document/10154004
[3] Ahmed M., Alazeb A., Al Mudawi N., Sadiq T.,
and et al., “Perception of Natural Scenes: Objects
Detection and Segmentations Using Saliency Map
with AlexNet,” The International Arab Journal of
Information Technology, vol. 22, no. 3, pp. 461-
475, 2025. https://doi.org/10.34028/iajit/22/3/4
[4] Anciukevicius T., Xu Z., Fisher M., Henderson P.,
and et al., “RenderDiffusion: Image Diffusion for
3D Reconstruction, Inpainting and Generation,” in
Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition,
Vancouver, pp. 12608-12618, 2023. DOI:
10.1109/CVPR52729.2023.01213
[5] Banani M., Corso J., and Fouhey D., “Novel
Object Viewpoint Estimation through
Reconstruction Alignment,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and
Pattern Recognition, Seattle, pp. 3110-3119,
2020. DOI: 10.1109/CVPR42600.2020.00318
[6] Bautista M., Talbott W., Zhai S., Srivastava N.,
and Susskind J., “On the Generalization of
Learning-based 3D Reconstruction,” arXiv
Preprint, vol. arXiv:2006.15427v1, pp. 1-10,
2020. https://arxiv.org/abs/2006.15427
[7] Bernardini F., Mittleman J., Rushmeier H., Silva
C., and Taubin G., “The Ball-Pivoting Algorithm
for Surface Reconstruction,” IEEE Transactions
on Visualization and Computer Graphics, vol. 5,
no. 4, pp. 349-359, 1999. DOI:
10.1109/2945.817351
[8] Chen Y., Xie R., Yang S., Dai L., and et al.,
“Single-View 3D Garment Reconstruction Using
Neural Volumetric Rendering,” IEEE Access, vol.
12, pp. 49682-49693, 2024. DOI:
10.1109/ACCESS.2024.3380059
[9] Choy C., Xu D., Gwak J., Chen K., and Savarese
S., “3D-R2N2: A Unified Approach for Single and
Multi-View 3D Object Reconstruction,” in
Proceedings of the 14th European Conference on
Computer Vision, Amsterdam, pp. 628-644, 2016.
https://link.springer.com/chapter/10.1007/978-3-
319-46484-8_38
[10] Farshian A., Gotz M., Cavallaro G., Debus C., and
et al., “Deep-Learning-based 3-D Surface
Reconstruction-a Survey,” Proceedings of the
IEEE, vol. 111, no. 11, pp. 1464-1501, 2023. DOI:
10.1109/JPROC.2023.3321433
[11] Gotz M., Cavallaro G., Geraud T., Book M., and
Riedel M., “Parallel Computation of Component
Trees on Distributed Memory Machines,” IEEE
Transactions on Parallel and Distributed Systems,
vol. 29, no. 11, pp. 2582-2598, 2018. DOI:
10.1109/TPDS.2018.2829724
[12] Gwak J., Choy C., Chandraker M., Garg A., and
Savarese S., “Weakly Supervised 3D
Reconstruction with Adversarial Constraint,” in
Proceedings of the International Conference on
3D Vision, Qingdao, pp. 263-272, 2017.
https://ieeexplore.ieee.org/document/8374579
[13] Han X., Laga H., and Bennamoun M., “Image-
based 3D Object Reconstruction: State-of-the-Art
and Trends in the Deep Learning Era,” IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 43, no. 5, pp. 1578-1604, 2021.
DOI: 10.1109/TPAMI.2019.2954885
[14] Huang Y., Huang S., Hsu H., and Wang Y.,
“Interpreting Latent Representation in Neural
Radiance Fields for Manipulating Object
Semantics,” in Proceedings of the IEEE
International Conference on Image Processing,
Kuala Lumpur, pp. 470-474, 2023.
https://ieeexplore.ieee.org/document/10222650
[15] Isola P., Zhu J., Zhou T., and Efros A., “Image-to-
Image Translation with Conditional Adversarial
Networks,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition, Honolulu, pp. 5967-5976, 2017.
https://ieeexplore.ieee.org/document/8100115
[16] Jo S., Lee D., and Rhee C., “Occlusion-Aware
Amodal Depth Estimation for Enhancing 3D
Reconstruction from a Single Image,” IEEE
Access, vol. 12, pp. 106524-106536, 2024.
https://doi.org/10.1109/access.2024.3436570
[17] Ko K., Kim S., and Lee M., “Zero-Shot 3D Scene
Representation with Invertible Generative Neural
Radiance Fields,” IEEE Access, vol. 13, pp.
68561-68576, 2025.
https://ieeexplore.ieee.org/document/10967257
[18] Laga H., Jospin L., Boussaid F., and Bennamoun
M., “A Survey on Deep Learning Techniques for
Stereo-based Depth Estimation,” IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 44, no. 4, pp. 1738-1764, 2022.
https://ieeexplore.ieee.org/document/9233988
[19] Rezende D., Ali Eslami S., Mohamed S., Battaglia Neural Volumetric Representations for Real-Time 3D Scene Reconstruction using ... 1201
P., and et al., “Unsupervised Learning of 3D
Structure from Images,” in Proceedings of the 30th
International Conference on Neural Information
Processing Systems, Barcelona, pp. 5004-5011,
2016.
https://dl.acm.org/doi/10.5555/3157382.3157656
[20] Samavati T. and Soryani M., “Deep Learning-
based 3D Reconstruction: A Survey,” Artificial
Intelligence Review, vol. 56, no. 9, pp. 9175-9219,
2023. https://doi.org/10.1007/s10462-023-10399-2
[21] Seitz S., Curless B., Diebel J., Scharstein D., and
Szeliski R., “A Comparison and Evaluation of
Multi-View Stereo Reconstruction Algorithms,”
in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern
Recognition, New York, pp. 519-528, 2006.
https://ieeexplore.ieee.org/document/1640800
[22] Shan Y., Liang C., and Xu M., “3D Reconstruction
and Estimation from Single-View 2D Image by
Deep Learning-A Survey,” in Proceedings of the
IEEE Conference on Artificial Intelligence,
Singapore, pp. 1-7, 2024. DOI:
10.1109/CAI59869.2024.00010
[23] Sitzmann V., Thies J., Heide F., Niebner M., and
et al., “DeepVoxels: Learning Persistent 3D
Feature Embeddings,” arXiv Preprint, vol.
arXiv:1812.01024v2, pp. 1-10, 2018.
https://arxiv.org/abs/1812.01024
[24] Tatarchenko M., Dosovitskiy A., and Brox T.,
“Multi-View 3D Models from Single Images with
a Convolutional Network,” arXiv Preprint, vol.
arXiv:1511.06702v2, pp. 1-20, 2016.
https://arxiv.org/abs/1511.06702
[25] Tatarchenko M., Dosovitskiy A., and Brox T.,
“Octree Generating Networks: Efficient
Convolutional Architectures for High-Resolution
3D Outputs,” in Proceedings of the IEEE
International Conference on Computer Vision,
Venice, pp. 2107-2115, 2017.
https://ieeexplore.ieee.org/document/8237492
[26] Tian Y., Zhang H., Liu Y., and Wang L.,
“Recovering 3D Human Mesh from Monocular
Images: A Survey,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 26, no. 12,
15406-15425, 2023. DOI:
10.1109/TPAMI.2023.3298850
[27] Tulsiani S., Kar A., Carreira J., and Malik J.,
“Learning Category-Specific Deformable 3D
Models for Object Reconstruction,” IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 39, no. 4, pp. 719-731, 2017.
https://ieeexplore.ieee.org/document/7482798
[28] Vinodkumar P., Karabulut D., Avots E., Ozcinar
C., and Anbarjafari G., “Deep Learning for 3D
Reconstruction, Augmentation, and Registration:
A Review Paper,” Entropy, vol. 26, no. 3, pp. 1-
44, 2024. https://doi.org/10.3390/e26030235
[29] Wang C., Reza M., Vats V., Ju Y., and et al.,
“Deep Learning-based 3D Reconstruction from
Multiple Images: A Survey,” Neurocomputing,
vol. 579, pp. 128018, 2024.
https://doi.org/10.1016/j.neucom.2024.128018
[30] Worrall D., Garbin S., Turmukhambetov D., and
Brostow G., “Interpretable Transformations with
Encoder-Decoder Networks,” in Proceedings of
the IEEE International Conference on Computer
Vision, Venice, pp. 5737-5746, 2017.
https://ieeexplore.ieee.org/document/8237873
[31] Wu D., Li Y., Yang R., Li S., and et al., “Neural
Radiance Field Reconstruction Technique Under
Layer Training Strategy,” in Proceedings of the
International Conference on HVDC, Urumqi, pp.
747-750, 2024.
https://ieeexplore.ieee.org/document/10723007
[32] Xie H., Yao H., Zhang S., Zhou S., and Sun W.,
“Pix2Vox++: Multiscale Context-Aware 3D
Object Reconstruction from Single and Multiple
Images,” International Journal of Computer
Vision, vol. 128, pp. 2919-2935, 2020.
https://link.springer.com/article/10.1007/s11263-
020-01347-6
[33] Yang J., Zhang G., Li Y., and Yang L., “VST3D-
Net: Video-based Spatio-Temporal Network for
3D Shape Reconstruction from a Video,” in
Proceedings of the International Conference on
3D Immersion, Brussels, pp. 1-7, 2020.
https://ieeexplore.ieee.org/document/9376350
[34] Yang L., Yang C., Xie R., Liu J., and et al., “3D
Reconstruction from Traditional Methods to Deep
Learning,” in Proceedings of the IEEE 10th
International Conference on Cyber Security and
Cloud Computing, and 9th International
Conference on Edge Computing and Scalable
Cloud, Xiangtan, pp. 387-392, 2023.
https://ieeexplore.ieee.org/document/10195547
[35] Yu X., Tang J., Qin Y., Li C., and et al., “PVSeRF:
Joint Pixel-, Voxel- and Surface-Aligned
Radiance Field for Single-Image Novel View
Synthesis,” in Proceedings of the 30th ACM
International Conference on Multimedia, Lisboa,
pp. 1572-1583, 2022.
https://doi.org/10.1145/3503161.3547893
[36] Zhang J., Dong Y., Kuang M., Liu B., and et al.,
“The Art of Defense: Letting Networks Fool the
Attacker,” IEEE Transactions on Information
Forensics and Security, vol. 18, pp. 3267-3276,
2023.
https://ieeexplore.ieee.org/document/10130393
[37] Zhang K., Liu M., Zhang J., and Dong Z., “PA-
MVSNet: Sparse-to-Dense Multi-View Stereo
with Pyramid Attention,” IEEE Access, vol. 9, pp.
27908-27915, 2021.
https://ieeexplore.ieee.org/document/9352763
[38] Zheng Z., Yu T., Liu Y., and Dai Q., “PaMIR:
Parametric Model-Conditioned Implicit
Representation for Image-based Human 1202 The International Arab Journal of Information Technology, Vol. 22, No. 6, November 2025
Reconstruction,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 44, no. 6,
pp. 3170-3184, 2022.
https://ieeexplore.ieee.org/document/9321139
Pidatala Devendrababu has
completed his B.Tech from Anurag
Engineering College, Kodad, and
M.Tech from MITS College, Kodad.
He is currently pursuing his Ph.D. at
KL University. With over 12 years of
experience as an Assistant Professor,
he has served in various engineering colleges under the
Jawaharlal Nehru Technological University, Hyderabad
(JNTUH). His primary areas of interest include
Artificial Intelligence (AI) and Machine Learning (ML),
Natural Language Processing (NLP), and Deep
Learning. His academic and professional journey
reflects a deep commitment to teaching and research in
the field of intelligent computing. He can be contacted
at email: pidataladevendrababu@gmail.com.
Preeti Jha is an Associate Professor
in the Department of Computer
Science and Engineering at KLH
Hyderabad. She earned her Ph.D. and
completed post-doctoral research
under the guidance of Dr. Aruna
Tiwari at IIT Indore. Her research
focuses on Fuzzy Clustering, Data Mining, Machine
Learning, and Large-Scale Genomic Data Analysis. She
is actively involved in academic research and
contributes to advancements in soft computing
techniques. Dr. Jha is also a member of the Soft
Computing Research Society, reflecting her dedication
to interdisciplinary research and collaboration within
the field of intelligent data analysis and computational
methodologies. She can be contacted at email:
preetijha@klh.edu.in.