Foundations of Generative AI and Applied Machine Learning
Giang NGUYEN, doc. Ing. PhD.
giang.nguyen@stuba.sk
Fakulta informatiky a informačných technológii STU v Bratislave
Oleksandr LYTVYN, Ing. PhD.
oleksandr.lytvyn@stuba.sk
Fakulta informatiky a informačných technológii STU v Bratislave
Abstract:
"Applied machine learning is reshaping the way we interact with technology, turning vast amounts of data into useful insights and smarter decisions. One of its most visible and powerful applications is in recommender systems, which helps people discover what to watch, read, buy, or explore next. By analyzing patterns in user behavior and preferences, recommender systems deliver personalized suggestions that make our digital experiences more efficient, enjoyable, and relevant. From streaming platforms and online shopping to educational tools and research discovery, applied machine learning and recommender systems are changing the way we access information, make choices, and connect with the world around us.
In a broader context of machine learning, Artificial Intelligence (AI) has rapidly become integrated into human life over the past few years. This integration, however, has not occurred uniformly worldwide; it has been most pronounced in technologically advanced regions such as the European Union, the United States and East Asia and influences almost all domains. The speed and scale of this transformation offer extraordinary opportunities, but also carry risks of disruption and unintended consequences. Large language models are moving from being static pre-trained models to becoming adaptive, self-improving systems that can dynamically retrain on domain-specific knowledge or user feedback in near real time. Within this shift, deep learning and generative AI models are redefining traditional methods of inquiry and problem solving. Together, these technologies have the potential to accelerate knowledge discovery while reshaping human life on a societal scale, influencing institutions, economies, and cultural practices.
The promise is clear: AI can reduce the time spent on routine tasks, and it also opens new possibilities for connection and collaboration in diverse fields. In this way, AI acts as catalysts and amplifiers of human progress. However, these gains come with challenges. Rapid adoption of such technologies, particularly generative AI, raises urgent concerns about bias, misuse, ethics, and trust.
The textbook is intended for students of generative AI and applied machine learning courses, as well as the general professional public dealing with or interested in data science and AI. On the one hand, it explores the practical applications of generative AI and applied machine learning, with recommender systems highlighted as the most well-known use case, providing hands-on techniques and real-world case studies. On the other hand, it provides a brief overview of machine learning governance, including MLOps for scalable development and deployment, and privacy-preserving machine learning techniques to ensure responsible AI practices."
DOI: 10.61544/VNEV8972
Literatúra:
[1] Abbas Acar et al. “A survey on homomorphic encryption schemes: Theory and implementation”. In: ACM Computing Surveys (Csur) 51.4 (2018), pages 1–35. DOI: 10.1145/3214303 (cited on page 127).
[2] EU AI Act. European Union Artificial Intelligence Act. Accessed on 08.06.2025, Eu- ropean Union. 2024. URL: https://artificialintelligenceact.eu/the- act/ (cited on page 133).
[3] EU DMA Act. The Digital Markets Act: ensuring fair and open digital markets. Accessed on 08.06.2025, European Union. 2022. URL: https :// commission . europa . eu / strategy - and - policy / priorities - 2019 - 2024 / europe - fit - digital - age / digital-markets-act-ensuring-fair-and-open-digital-markets_en (cited on page 133).
[4] EU DSA Act. European Union Digital Services Act. Accessed on 08.06.2025, European Union. 2022. URL: https :// commission . europa . eu / strategy - and - policy / priorities-2019-2024/europe-fit-digital-age/digital-services-act_en (cited on page 133).
[5] Nikolas Adaloglou. How Attention works in Deep Learning: understanding the at- tention mechanism in sequence models. Accessed on 05.09.2025. URL: https://theaisummer.com/attention/ (cited on pages 76, 77).
[6] Marek Adamove. Contextualmusic playlist recommendation. Master Thesis, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava. 2025 (cited on page 12).
[7] Charu C Aggarwal. Data mining: the textbook. Springer, 2015. DOI: 10.1007/978-3- 319-14142-8 (cited on page 121).
[8] Charu C Aggarwal et al. Recommender systems. Springer, 2016. ISBN: 978-3-319- 29657-9 (cited on pages 12, 37, 63).
[9] Charu C Aggarwal and S Yu Philip. “A general survey of privacy-preserving data mining models and algorithms”. In: Privacy-preserving data mining. Springer, 2008, pages 11–52. DOI: 10.1007/978-0-387-70992-5_2 (cited on page 123).
[10] Airflow. Apache Airflow - A platform to programmatically author, schedule, and monitor workflows. Accessed on 21.12.2023. URL: https :// github . com / apache / airflow (cited on page 105).
[11] Fatemeh Alyari and Nima Jafari Navimipour. “Recommender systems: A system- atic review of the state of the art literature and suggestions for future research”. In: Kybernetes 47.5 (2018), pages 985–1017 (cited on page 31).
[12] Sana Abida Amin, James Philips, and Nasseh Tabrizi. “Current trends in collabora- tive filtering recommendation systems”. In: World Congress on Services. Springer. 2019, pages 46–60 (cited on page 19).
[13] Alejandro Barredo Arrieta et al. “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI”. In: Information Fusion 58 (2020), pages 82–115. DOI: 10.1016/j.inffus.2019.12.012 (cited on page 133).
[14] ARX. ARX data anonymization tool. Accessed on 11.01.2024. URL: https://arx.deidentifier.org/ (cited on page 122).
[15] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Trans- lation by Jointly Learning to Align and Translate”. In: International Conference on Learning Representations (ICLR). 2015. URL: https://arxiv.org/abs/1409.0473 (cited on pages 76, 78).
[16] Antonio Barbalau et al. “Black-Box Ripper: Copying black-box models using gener- ative evolutionary algorithms”. In: Advances in Neural Information Processing Systems 33 (2020), pages 20120–20129. DOI: 10.5555/3495724.3497413 (cited on page 118).
[17] Luke A Bauer and Vincent Bindschaedler. “Towards realistic membership infer- ences: The case of survey data”. In: Annual Computer Security Applications Conference. 2020, pages 116–128. DOI: 10.1145/3427228.3427282 (cited on pages 116, 117, 123).
[18] Khalid Benabbes et al. “Recommendation System Issues, Approaches and Chal- lenges Based on User Reviews”. In: Journal of Web Engineering 21.4 (2022), pages 1017– 1054. DOI: 10.13052/jwe1540-9589.2143 (cited on page 19).
[19] Yoshua Bengio et al. “A neural probabilistic language model”. In: Journal of machine learning research 3. Feb (2003), pages 1137–1155. URL: %7Bhttp://www.jmlr.org/ papers/volume3/bengio03a/bengio03a.pdf%7D (cited on page 75).
[20] Lisana Berberi et al. “Machine learning operations landscape: platforms and tools”. In: Artificial Intelligence Review 58.6 (2025), page 167. DOI: 10.1007/s10462-025- 11164-3 (cited on pages 106, 111).
[21] Elisa Bertino. “Data security and privacy: Concepts, approaches, and research directions”. In: 2016 IEEE 40th Annual computer Software and Applications conference. Volume 1. IEEE. 2016, pages 400–407. DOI: 10.1109/COMPSAC.2016.89 (cited on page 114).
[22] Joachim Biskup et al. “Efficient inference control for open relational queries”. In: IFIP Annual Conference on Data and Applications Security and Privacy. Springer. 2010, pages 162–176. DOI: 10.1007/978-3-642-13739-6_11 (cited on page 123).
[23] Erion Çano. “Hybrid Recommender Systems: A Systematic Literature Review”. In: Intelligent Data Analysis 21 (Nov. 2017), pages 1487–1524. DOI: 10.3233/IDA-163209 (cited on page 31).
[24] George Casella and Roger L Berger. Statistical inference. Cengage Learning, 2002. ISBN: 0-534-24312-6 (cited on page 63).
[25] Hao Chen, Ilaria Chillotti, and Yongsoo Song. “Multi-Key Homomorphic Encryp- tion from TFHE”. In: Advances in Cryptology – ASIACRYPT 2019. Edited by Steven Galbraith and Shiho Moriai. Cham: Springer International Publishing, 2019, pages 446–472. ISBN: 978-3-030-34621-8 (cited on page 128).
[26] Rui Chen et al. “A Survey of Collaborative Filtering-Based Recommender Systems: From Traditional Methods to Hybrid Methods Based on Social Networks”. In: IEEE Access 6 (2018), pages 64301–64320. DOI: 10.1109/ACCESS.2018.2877208 (cited on page 16).
[27] Xinyun Chen et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. 2017. arXiv: 1712.05526 [cs.CR] (cited on pages 117, 118).
[28] Yu Chen et al. “A training-integrity privacy-preserving federated learning scheme with trusted execution environment”. In: Information Sciences 522 (2020), pages 69–79, DOI: 10.1016/j.ins.2020.02.037 (cited on page 128).
[29] Huyen Chip. Designing machine learning systems: An iterative process for production- ready applications. O’Reilly Media, 2022. ISBN: ISBN 978-1098107963 (cited on pages 94, 105, 111).
[30] François Chollet. Keras - Deep Learning for humans. Accessed on 12.12.2023. 2023. URL: https://github.com/keras-team/keras (cited on page 112).
[31] David Cournapeau. scikit-learn: machine learning in Python. Accessed on 12.12.2023. URL: https://github.com/scikit-learn/scikit-learn (cited on page 112).
[32] Koby Crammer, Mark Dredze, and Fernando Pereira. “Exact convex confidence- weighted learning”. In: Advances in Neural Information Processing Systems 21 (2008). URL: https : / / proceedings . neurips . cc / paper _ files / paper / 2008 / file / 68ce199ec2c5517597ce0a4d89620f55-Paper.pdf (cited on page 35).
[33] Crypten. CrypTen is a framework for Privacy Preserving Machine Learning built on Py- Torch. Accessed on 08.04.2025. URL: https://github.com/facebookresearch/ crypten (cited on page 130).
[34] DataCamp. 17 Top MLOps Tools You Need to Know. Accessed on 04.12.2023. URL: https://www.datacamp.com/blog/top-mlops-tools (cited on page 108).
[35] DataRobot. MLOps 101: The Foundation for Your AI Strategy. Accessed on 29.12.2023. URL: https://www.datarobot.com/mlops-101/ (cited on page 108).
[36] DL4J. deeplearning4j (DL4J) Suite of tools for deploying and training deep learning models using the JVM. Accessed on 12.12.2023. URL: https://github.com/ deeplearning4j/deeplearning4j (cited on page 113).
[37] Cynthia Dwork, Aaron Roth, et al. “The algorithmic foundations of differential privacy”. In: Foundations and Trends® in Theoretical Computer Science 9.3–4 (2014), pages 211–407. DOI: 10.1561/0400000042 (cited on page 124).
[38] Ahmed El Ouadrhiri and Ahmed Abdelhadi. “Differential privacy for deep and federated learning: A survey”. In: IEEE access 10 (2022), pages 22359–22380. DOI: 10.1109/ACCESS.2022.3151670 (cited on page 124).
[39] tf-encrypted. A Framework for Encrypted Machine Learning in TensorFlow. Accessed on 08.04.2025. URL: https://github.com /tf- encrypted/tf- encrypted (cited on page 130).
[40] EU Ethics. EU guidelines on ethics in artificial intelligence: Context and implementation. Accessed on 08.06.2025, European Union. URL: https :// www . europarl . europa.eu/thinktank/en/document.html?reference=EPRS_BRI(2019)640163 (cited on page 133).
[41] David Evans, Vladimir Kolesnikov, Mike Rosulek, et al. “A pragmatic introduction to secure multi-party computation”. In: Foundations and Trends® in Privacy and Security 2.2-3 (2018), pages 70–246. DOI: 10.1561/3300000019 (cited on page 128).
[42] EvidentlyAI. Evaluate and monitor ML models from validation to production. Accessed: 29.11.2024. URL: https://www.evidentlyai.com/ (cited on page 109).
[43] fast.ai. The fastai deep learning library. Accessed on 12.12.2023. URL: https://github.com/fastai/fastai. (cited on page 113).
[44] Fate. FATE (Federated AI Technology Enabler). Accessed on 04.04.2025. URL: https://github.com/FederatedAI/FATE (cited on page 130).
[45] Tensorflow Federated. Tensorflow Federated: Machine Learning on Decen- tralized Data. Accessed on 04.04.2025. URL: https://github.com/google-parfait/ tensorflow-federated (cited on page 130).
[46] FedML. FedML, Federated Learning/Analytics and Edge AI Platform. Accessed on 04.04.2025. URL: https://fedml.ai/ (cited on page 130).
[47] NVIDIA FLARE. NVIDIA Federated Learning Application Runtime Environment. Ac- cessed on 04.04.2025. URL: https://github.com/NVIDIA/NVFlare (cited on page 130).
[48] Flower. Flower - A Friendly Federated Learning Framework. Accessed on 04.04.2025. URL: https://github.com/adap/flower (cited on page 130).
[49] Martin Garriga et al. “Dataops for cyber-physical systems governance: The airport passenger flow case”. In: ACM Transactions on Internet Technology (TOIT) 21.2 (2021), pages 1–25. DOI: 10.1145/3432247 (cited on page 106).
[50] EU GDPR. General Data Protection Regulation. Accessed on 08.06.2025, European Union. URL: https://eur-lex.europa.eu/eli/reg/2016/679/oj (cited on pages 121, 123, 133).
[51] GeeksforGeeks. Software and its Types. Accessed on 08.06.2025, European Union. URL: https://www.geeksforgeeks.org/computer-science-fundamentals/ software-and-its-types/ (cited on page 93).
[52] geeksforgeeks. Difference Between Fine-Tuning and Transfer Learning. Accessed on 05.09.2025. URL: https://www .geeksforgeeks.org/machine- learning/ what - is - the - difference - between - fine - tuning - and - transfer - learning/ (cited on page 79).
[53] geeksforgeeks. Top 15 Vector Databases in 2025. Accessed on 08.08.2025. URL: geeksforgeeks (cited on page 110).
[54] Craig Gentry. A fully homomorphic encryption scheme. ProQuest Dissertations Publish- ing, 3382729. Stanford University. Accessed on 12.12.2023. URL: https://www. proquest.com/docview/305003863?pq-origsite=gscholar&fromopenview=true (cited on page 127).
[55] Xueluan Gong et al. “InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion.” In: IJCAI. 2021, pages 2439–2447. URL: https://www. ijcai.org/proceedings/2021/0336.pdf (cited on pages 117, 118).
[56] Google. Wide and Deep Learning: Better Together with TensorFlow. Accessed on 05.09.2025. URL: https://research.google/blog/wide-amp-deep-learning-better- together-with-tensorflow/ (cited on page 89).
[57] Google. Cloud Architecture Center - MLOps: Continuous delivery and automation pipelines in machine learning. Accessed on 12.12.2023. URL: https://cloud. google . com / architecture / mlops - continuous - delivery - and - automation - pipelines-in-machine-learning (cited on page 106).
[58] GoogleCloud. Scaling deep retrieval with TensorFlow Recommenders and Vertex AI Matching Engine. Accessed on 09.09.2025. URL: https://cloud.google.com/ blog/products/ai-machine-learning/scaling-deep-retrieval-tensorflow- two-towers-architecture (cited on page 88).
[59] GoogleDP. Google’s differential privacy libraries.. Accessed on 08.04.2025. URL: https://github.com/google/differential-privacy (cited on page 130).
[60] GoogleJAX. JAX - Composable transformations of Python+NumPy programs: differ- entiate, vectorize, JIT to GPU/TPU, and more. Accessed on 11.01.2024. URL: https://github.com/google/jax (cited on page 112).
[61] H2O. H2O is an Open Source, Distributed, Fast and Scalable Machine Learning Platform. Accessed on 12.12.2023. URL: https://github.com/h2oai/h2o-3 (cited on page 113).
[62] Zecheng He, Tianwei Zhang, and Ruby B Lee. “Model inversion attacks against collaborative inference”. In: Proceedings of the 35th Annual Computer Security Appli- cations Conference. 2019, pages 148–162. DOI: 10.1145/3359789.3359824 (cited on pages 117, 118).
[63] HElib. Open-source software library that implements homomorphic encryption (HE). Accessed on 12.12.2023. URL: https://github.com/homenc/HElib (cited on pages 130, 131).
[64] Laura Hergenrother and Seongwook Park. Fully Homomorphic Encryption on IBM Cloud Hyper Protect Virtual Servers. Accessed on 12.12.2023. URL: https://www. proquest.com/docview/305003863?pq-origsite=gscholar&fromopenview=true (cited on page 127).
[65] Johannes Heurix et al. “A taxonomy for privacy enhancing technologies”. In: Computers & Security 53 (2015), pages 1–17. DOI: 10.1016/j.cose.2015.05.002 (cited on page 119).
[66] Nipuni Hewage and Dulani Meedeniya. “Machine Learning Operations: A Survey on MLOps Tool Support”. In: ArXiv (2022). DOI: 10.48550/ARXIV.2202.10169 (cited on page 105).
[67] HIPS-Autograd. Autograd - Efficiently computes derivatives of numpy code. Accessed on 12.01.2024. URL: https://github.com/hips/autograd (cited on page 112).
[68] W Ronny Huang et al. “Metapoison: Practical general-purpose clean-label data poi- soning”. In: Advances in Neural Information Processing Systems 33 (2020), pages 12080– 12091. URL: https://proceedings.neurips.cc/paper_files/paper/2020 (cited on pages 117, 118).
[69] HuggingFace. Production Inference Made Easy. Accessed on 08.08.2025. URL: https://huggingface.co/inference-endpoints/dedicated (cited on page 109).
[70] Waldemar Hummer et al. “Modelops: Cloud-based lifecycle management for reliable and trusted ai”. In: 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE. 2019, pages 113–120. DOI: 10 . 1109 / IC2E . 2019 . 00025 (cited on page 106).
[71] IBM-Diffprivlib. Diffprivlib is a general-purpose library for experimenting with, inves- tigating and developing applications in, differential privacy. Accessed on 08.04.2025. URL: https://github.com/IBM/differential-privacy-library (cited on page 130).
[72] IBM-FL. IBM Federated Learning. Accessed on 04.04.2025. URL: https :// github.com/IBM/federated-learning-lib (cited on page 130).
[73] ISO.org. ISO/IEC 20889:2018 Privacy enhancing data de-identification terminology and classification of techniques. 2018. URL: https://www.iso.org/standard/69373.html (cited on pages 121, 123).
[74] ISO.org. ISO/IEC 5338:2023 Information technology — Artificial intelligence — AI system life cycle processes. Accessed on 08.08.2025. URL: https://www.iso. org/standard/81118.html (cited on page 93).
[75] Amir H Jadidinejad, Craig Macdonald, and Iadh Ounis. “Unifying explicit and implicit feedback for rating prediction and ranking recommendation tasks”. In: Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval. 2019, pages 149–156. DOI: 10.1145/3341981.3344225 (cited on page 15).
[76] Dietmar Jannach and Michael Jugovac. “Measuring the business value of recom- mender systems”. In: ACM Transactions on Management Information Systems (TMIS) 10.4 (2019), pages 1–23. DOI: 10.1145/3370082 (cited on page 12).
[77] Dietmar Jannach et al. Recommender systems: an introduction. Cambridge university press, 2010. ISBN: 978-0-521-49336-9 (cited on page 37).
[78] Adel Jebali, Salma Sassi, and Abderrazak Jemai. “Inference Control in Distributed Environment: A Comparison Study”. In: International Conference on Risks and Secu- rity of Internet and Systems. Springer. 2019, pages 69–83. DOI: 10.1007/978-3-030- 41568-6_5 (cited on page 123).
[79] Yitong Ji et al. “A critical study on data leakage in recommender system offline evaluation”. In: ACM Transactions on Information Systems 41.3 (2023), pages 1–27. DOI: 10.1145/35699 (cited on page 39).
[80] Lauma Jokste. “Towards a model of context-aware recommender system”. In: CEUR Workshop Proceedings 1367 (Jan. 2015), pages 145–152 (cited on page 21).
[81] Georgios A Kaissis et al. “Secure, privacy-preserving and federated machine learn- ing in medical imaging”. In: Nature Machine Intelligence 2.6 (2020), pages 305–311. DOI: 10.1038/s42256-020-0186-1 (cited on page 128).
[82] Ehud Karavani. Solving Simpson’s Paradox with Inverse Probability Weighting. Ac- cessed on 12.08.2025. URL: https://medium.com/data-science/solving- simpsons - paradox - with - inverse - probability - weighting - 79dbb1395597 (cited on page 35).
[83] Krishnaram Kenthapadi, Nina Mishra, and Kobbi Nissim. “Denials leak informa- tion: Simulatable auditing”. In: Journal of Computer and System Sciences 79.8 (2013), pages 1322–1340. DOI: 10.1016/j.jcss.2013.06.004 (cited on page 123).
[84] Keras3. Introducing Keras 3.0. Accessed on 12.12.2023. URL: https://keras. io/keras_3/ (cited on page 112).
[85] Soo-Cheol Kim et al. “Improvement of collaborative filtering using rating nor- malization”. In: Multimedia tools and applications 75 (2016), pages 4957–4968. DOI: 10.1007/s11042-013-1814-0 (cited on page 18).
[86] Daniel Kluver, Michael D Ekstrand, and Joseph A Konstan. “Rating-based collabo- rative filtering: algorithms and evaluation”. In: Social information access: Systems and technologies (2018), pages 344–390. DOI: 10.1007/978-3-319-90092-6_10 (cited on page 14).
[87] Dominik Kreuzberger, Niklas Kühl, and Sebastian Hirschl. “Machine learning operations (mlops): Overview, definition, and architecture”. In: IEEE Access (2023). DOI: 10.1109/ACCESS.2023.3262138 (cited on page 106).
[88] Kubeflow. Machine Learning Toolkit for Kubernetes. Accessed on 10.12.2024. URL: https://github.com/kubeflow/kubeflow (cited on page 109).
[89] Pushpendra Kumar and Ramjeevan Singh Thakur. “Recommendation system techniques and related issues: a survey”. In: International Journal of Information Technology 10 (2018), pages 495–501. DOI: 10.1007/s41870-018-0138-8 (cited on page 19).
[90] Ruslan Kuprieiev et al. DVC: Data Version Control - Git for Data & Models. Ver- sion 3.30.3. Nov. 2023. URL: https://doi.org/10.5281/zenodo.10214841 (cited on page 107).
[91] lakeFS. Scalable Data Version Control. Accessed on 29.12.2023. URL: https://lakefs.io/ (cited on page 108).
[92] Abdullah Lakhan et al. “Federated learning enables intelligent reflecting surface in fog-cloud enabled cellular network”. In: PeerJ Computer Science 7 (2021), e758. DOI: 10.7717/peerj-cs.758 (cited on page 128).
[93] Marian Lambert et al. “Robustness analysis of machine learning models using domain-specific test data perturbation”. In: EPIA Conference on Artificial Intelligence. Springer. 2023, pages 158–170. DOI: 10.1007/978-3-031-49008-8_13 (cited on page 123).
[94] LangChain. LangChain - The platform for reliable agents. Accessed on 08.08.2025. URL: https://www.langchain.com/ (cited on page 110).
[95] Peterson Larry and Davie Bruce. Computer Networks: A Systems Approach, 6th Edition. Morgan Kaufmann, 2021. ISBN: 978-0128182000. URL: https : / / book . systemsapproach.org (cited on pages 113, 115).
[96] QH Le, SL Vu, and AC Le. “A comparative analysis of various approaches for incorporating contextual information into recommender systems”. In: Computer Systems Science and Engineering (2021). DOI: 10.3844/jcssp.2022.187.203 (cited on page 23).
[97] Kay Lefevre et al. “ModelOps for enhanced decision-making and governance in emergency control rooms”. In: Environment Systems and Decisions 42.3 (2022), pages 402–416. DOI: 10.1007/s10669-022-09855-1 (cited on page 106).
[98] Klas Leino and Matt Fredrikson. “Stolen memories: Leveraging model memo- rization for calibrated White-Box membership inference”. In: 29th USENIX security symposium (USENIX Security 20). 2020, pages 1605–1622. URL: https://www.usenix.org/system/files/sec20-leino.pdf (cited on page 116).
[99] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. “t-closeness: Privacy beyond k-anonymity and l-diversity”. In: 2007 IEEE 23rd international conference on data engineering. IEEE. 2006, pages 106–115. DOI: 10.1109/ICDE.2007.367856 (cited on page 121).
[100] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. “t-closeness: Privacy beyond k-anonymity and l-diversity”. In: 2007 IEEE 23rd International Conference on Data Engineering. IEEE. 2007, pages 106–115. DOI: 10.1109/ICDE.2007.367856 (cited on page 121).
[101] Yangguang Li et al. “Predicting node failures in an ultra-large-scale cloud com- puting platform: an aiops solution”. In: ACM Transactions on Software Engineering and Methodology (TOSEM) 29.2 (2020), pages 1–24. DOI: 10.1145/3385187 (cited on page 106).
[102] Tongliang Liu and Dacheng Tao. “Classification with noisy labels by importance reweighting”. In: IEEE Transactions on pattern analysis and machine intelligence 38.3 (2015), pages 447–461. DOI: 10.1109/TPAMI.2015.2456899 (cited on page 35).
[103] Adriana López-Alt, Eran Tromer, and Vinod Vaikuntanathan. “On-the-fly multi- party computation on the cloud via multikey fully homomorphic encryption”. In: Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 2012, pages 1219–1234 (cited on page 128).
[104] Luigi. Luigi - Python module to build complex pipelines of batch jobs. Accessed on 21.12.2023. 2023. URL: https://github.com/spotify/luigi (cited on page 105).
[105] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. “Effective Ap- proaches to Attention-based Neural Machine Translation”. In: Conference on Empiri- cal Methods in Natural Language Processing (EMNLP). 2015, pages 1412–1421. URL: https://arxiv.org/abs/1508.04025 (cited on page 76).
[106] Welder Pinheiro Luz, Gustavo Pinto, and Rodrigo Bonifácio. “Adopting DevOps in the real world: A theory, a model, and a case study”. In: Journal of Systems and Software 157 (2019), page 110384. DOI: 10.1016/j.jss.2019.07.083 (cited on page 106).
[107] Oleksandr Lytvyn. Machine Learning and Sensitive Data Protection. Dissertation Thesis, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava. 2025 (cited on pages 117, 130).
[108] Ashwin Machanavajjhala et al. “l-diversity: Privacy beyond k-anonymity”. In: ACM Transactions on Knowledge Discovery from Data (TKDD) 1.1 (2007), 3–es. DOI: 10.1145/1217299.1217302 (cited on page 121).
[109] Ganesh Kumar Mahato and Swarnendu Kumar Chakraborty. “A comparative review on homomorphic encryption for cloud security”. In: IETE Journal of Research 69.8 (2023), pages 5124–5133. DOI: 10.1080/03772063.2021.1965918 (cited on page 127).
[110] Abdul Majeed and Sungchang Lee. “Anonymization techniques for privacy pre- serving data publishing: A comprehensive survey”. In: IEEE Access 9 (2020), pages 8512–8545. DOI: 10.1109/ACCESS.2020.3045700 (cited on page 121).
[111] Mohammad Malekzadeh, Anastasia Borovykh, and Deniz Gündüz. “Honest-but- curious nets: Sensitive attributes of private inputs can be secretly coded into the classifiers’ outputs”. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021, pages 825–844. DOI: 10.1145/3460120.3484533 (cited on pages 117, 118).
[112] Inas Amjed Al-Mani, Ali M. Ahmed Al-Sabaawi, and Mohsin Hasan Hussien. “A Review Paper of Model Based Collaborative Filtering Techniques”. In: 2022 International Conference on Data Science and Intelligent Computing (ICDSIC). 2022, pages 52–57. DOI: 10.1109/ICDSIC56987.2022.10076148 (cited on page 19).
[113] Luca Melis et al. “Exploiting unintended feature leakage in collaborative learning”. In: 2019 IEEE symposium on security and privacy (SP). IEEE. 2019, pages 691–706. DOI: 10.1109/SP.2019.00029 (cited on pages 117, 118).
[114] Alfred J Menezes, Paul C Van Oorschot, and Scott A Vanstone. Handbook of applied cryptography. CRC press, 2018. ISBN: 9780429466335. DOI: 10.1201/9780429466335 (cited on page 126).
[115] MetaPlatforms. Opacus a library that enables training PyTorch models with differential privacy. Accessed on 08.04.2025. URL: https://github.com/pytorch/opacus (cited on page 130).
[116] Ilya Mironov, Kunal Talwar, and Li Zhang. Rényi Differential Privacy of the Sampled Gaussian Mechanism. 2019. arXiv: 1908.10530 [cs.LG] (cited on page 124).
[117] MLflow-Org(Databricks). MLflow: A Machine Learning Lifecycle Platform. Accessed on 10.12.2024. URL: https : / / github . com / mlflow / mlflow/ (cited on page 107).
[118] Fan Mo et al. “PPFL: privacy-preserving federated learning with trusted execution environments”. In: Proceedings of the 19th annual international conference on mobile sys- tems, applications, and services. 2021, pages 94–108. DOI: 10.1145/3458864.3466628 (cited on page 128).
[119] Marwa Hussien Mohamed, Mohamed Helmy Khafagy, and Mohamed Hasan Ibrahim. “Recommender Systems Challenges and Solutions Survey”. In: 2019 International Conference on Innovative Trends in Computer Engineering (ITCE). 2019, pages 149–155. DOI: 10.1109/ITCE.2019.8646645 (cited on page 19).
[120] Seyed-Mohsen Moosavi-Dezfooli et al. “Universal adversarial perturbations”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pages 1765–1773. URL: https://openaccess.thecvf.com/content_cvpr_2017/ html/Moosavi-Dezfooli_Universal_Adversarial_Perturbations_CVPR_2017_ paper.html (cited on page 117).
[121] Conor Morgan, Iulia Paun, and Nikos Ntarmos. “Exploring contextual paradigms in context-aware recommendations”. In: 2020 IEEE International Conference on Big Data (Big Data). IEEE. 2020, pages 3079–3084. DOI: 10.1109/BigData50022.2020.9377964 (cited on page 23).
[122] Aiswarya Raj Munappy et al. “From ad-hoc data analytics to dataops”. In: Proceed- ings of the International Conference on Software and System Processes. 2020, pages 165–174. DOI: 10.1145/3379177.3388909 (cited on page 106).
[123] Kundan Munjal and Rekha Bhatia. “A systematic review of homomorphic en- cryption and its contributions in healthcare industry”. In: Complex & Intelligent Systems 9.4 (2023), pages 3759–3786. DOI: 10.1007/s40747-022-00756-z (cited on page 127).
[124] Najdt Mustafa et al. “Collaborative filtering: Techniques and applications”. In: 2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE). 2017, pages 1–6. DOI: 10.1109/ICCCCEE.2017.7867668 (cited on page 19).
[125] Milad Nasr, Reza Shokri, and Amir Houmansadr. “Comprehensive privacy analy- sis of deep learning: Passive and active white-box inference attacks against central- ized and federated learning”. In: 2019 IEEE symposium on security and privacy (SP). IEEE. 2019, pages 739–753. DOI: 10.1109/SP.2019.00065 (cited on page 116).
[126] Netflix. Build and manage real-life data science projects with ease! Accessed on 10.12.2024. URL: https://github.com/Netflix/
[127] Giang Nguyen. Introduction to Data Science. Spektrum STU Publishing, 2022. ISBN:978-80-227-5193-3. URL: https://elvira.fiit.stuba.sk (cited on pages 31, 48, 63, 94).
[128] Giang Nguyen et al. “Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey”. In: Artificial Intelligence Review 52 (2019), pages 77–124. DOI: 10.1007/s10462-018-09679-z (cited on page 112).
[129] Giang Nguyen et al. “Network security AIOps for online stream data monitoring”. In: Neural Computing and Applications (2024), pages 1–25. DOI: 10.1007/s00521- 024-09863-z (cited on page 83).
[130] Giang Nguyen et al. “Landscape of machine learning evolution: privacy-preserving federated learning frameworks and tools”. In: Artificial Intelligence Review 58.2 (2025), page 51. DOI: 10.1007/s10462-024-11036-2 (cited on pages 112, 114, 120, 121, 129, 131).
[131] Athanasios N Nikolakopoulos et al. “Trust your neighbors: A comprehensive sur- vey of neighborhood-based methods for recommender systems”. In: Recommender systems handbook (2021), pages 39–89. DOI: 10.1007/978-1-0716-2197-4_2 (cited on page 18).
[132] Paolo Notaro, Jorge Cardoso, and Michael Gerndt. “A survey of aiops methods for failure management”. In: ACM Transactions on Intelligent Systems and Technology (TIST) 12.6 (2021), pages 1–45. DOI: 10.1145/3483424 (cited on page 106).
[133] NVidia. Using Neural Networks for Your Recommender System. Accessed on 05.09.2025. URL: https://developer.nvidia.com/
[134] OpenFHE. Open-Source Fully Homomorphic Encryption Library. Accessed on 12.12.2023. URL: https://github.com/openfheorg/openfhe- development (cited on page 131).
[135] Pachyderm. Data-Centric Pipelines and Data Versioning. Accessed on 10.12.2024. URL: https://github.com/pachyderm/pachyderm (cited on page 108).
[136] PaddlePaddle. PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice. Accessed on 12.12.2023. URL: https : / / github . com / PaddlePaddle/Paddle (cited on page 113).
[137] PapersWithCode. Papers with code. Trends on the paper implementations grouped by framework. Accessed on 11.12.2023. URL: https://paperswithcode .com / trends (cited on page 112).
[138] Natalia Ponomareva et al. “How to dp-fy ml: A practical guide to machine learning with differential privacy”. In: Journal of Artificial Intelligence Research 77 (2023), pages 1113–1201. DOI: 10.1613/jair.1.14649 (cited on page 124).
[139] Prefect. Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines. Accessed on 10.12.2024. URL: https://github.com/ PrefectHQ/prefect (cited on page 109).
[140] Tensorflow Privacy. TensorFlow Privacy is a Python library that includes implemen- tations of TensorFlow optimizers for training machine learning models with differential privacy. Accessed on 08.04.2025. URL: https://github.com/tensorflow/ privacy (cited on page 130).
[141] PyDP. PyDP The Python Differential Privacy Library. Accessed on 08.04.2025. URL: https://github.com/OpenMined/PyDP (cited on page 130).
[142] PySyft. OpenMined/PySyft - Data science on data without acquiring a copy. Accessed on 04.04.2025. URL: https :// github . com / OpenMined / PySyft (cited on pages 129, 130).
[143] PyTorch. PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. Accessed on 12.12.2023. URL: https://github.com/pytorch/ pytorch (cited on page 112).
[144] qdrant. Qdrant - High-Performance Vector Search at Scale. Accessed on 08.08.2025. URL: https://qdrant.tech/ (cited on page 110).
[145] Qentelli. Comprehensive List of DevOps Tools 2024. Accessed on 16.02.2024. URL: https :// www . qentelli . com / thought - leadership / insights / devops - tools (cited on page 106).
[146] Sandeep K Raghuwanshi and Rajesh Kumar Pateriya. “Recommendation systems: techniques, challenges, application, and evaluation”. In: Soft Computing for Problem Solving: SocProS 2017, Volume 2. Springer, 2018, pages 151–164. DOI: 10.1007/978- 981-13-1595-4_12 (cited on page 18).
[147] Shaina Raza and Chen Ding. “Progress in context-aware recommender systems—An overview”. In: Computer Science Review 31 (2019), pages 84–97. DOI: 10.1016/j. cosrev.2019.01.001 (cited on page 21).
[148] Steffen Rendle. “Factorization machines”. In: 2010 IEEE International conference on data mining. IEEE. 2010, pages 995–1000. DOI: 10.1109/ICDM.2010.127 (cited on page 18).
[149] Francesco Ricci, Lior Rokach, and Bracha Shapira. “Recommender systems: Tech- niques, applications, and challenges”. In: Recommender systems handbook (2021), pages 1–35. DOI: 10.1007/978-1-0716-2197-4_1 (cited on page 12).
[150] Maria Rigaki and Sebastian Garcia. A Survey of Privacy Attacks in Machine Learning. 2021. arXiv: 2007.07646 [cs.CR] (cited on pages 117, 118).
[151] riverml. A Python package for online/streaming machine learning. Accessed on 05.09.2025. URL: https://riverml.xyz (cited on page 109).
[152] Stuart Russell and Peter Norvig. Artificial intelligence: a modern approach, 4th Edition. Pearson, 2021. ISBN: 978-0134610993 (cited on pages 9, 37, 69).
[153] Judith Sáinz-Pardo Díaz. Privacy-preserving techniques for Data Science enviroment. Dissertation Thesis, Universidad de Cantabria, Spain. 2025 (cited on pages 122, 125).
[154] SEAL. Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library. Accessed on 12.12.2023. URL: https://github.com/microsoft/SEAL (cited on pages 130, 131).
[155] SecretFlow. SecretFlow - A unified framework for privacy-preserving data analysis and machine learning. Accessed on 12.12.2023. 2023. URL: https://github.com/ secretflow/secretflow (cited on pages 130, 131).
[156] S. M. Mahdi Seyednezhad et al. A Review on Recommendation Systems: Context-aware to Social-based. 2018. arXiv: 1811.11866 [cs.IR]. URL: https://arxiv.org/abs/ 1811.11866 (cited on page 19).
[157] Reza Shokri et al. “Membership inference attacks against machine learning models”. In: 2017 IEEE symposium on security and privacy (SP). IEEE. 2017, pages 3–18. DOI: 10.1109/SP.2017.41 (cited on pages 116, 117).
[158] Anshuman Suri and David Evans. “Formalizing and Estimating Distribution Infer- ence Risks”. In: Proceedings on Privacy Enhancing Technologies 4 (2022), pages 528–551. URL: https://petsymposium.org/popets/2022/popets-2022-0121.pdf (cited on pages 117, 118).
[159] Latanya Sweeney. “k-anonymity: A model for protecting privacy”. In: International journal of uncertainty, fuzziness and knowledge-based systems 10.05 (2002), pages 557–570. DOI: 10.1142/S0218488502001648 (cited on page 121).
[160] TenSEAL. A library for doing homomorphic encryption operations on tensors. Accessed on 12.12.2023. URL: https://github .com /OpenMined /TenSEAL (cited on pages 130, 131).
[161] Tensorflow. Tensorflow - An Open Source Machine Learning Framework for Everyone. Accessed on 12.12.2023. URL: https://github.com/tensorflow/tensorflow (cited on page 112).
[162] TensorFlow-XLA. XLA (Accelerated Linear Algebra) open-source compiler for machine learning. Accessed on 12.01.2024. URL: https://www.tensorflow.org/xla (cited on page 112).
[163] Vale Tolpegin et al. “Data poisoning attacks against federated learning systems”. In: Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer. 2020, pages 480–501. DOI: 10.1007/978-3-030-58951-6_24 (cited on pages 117, 118).
[164] Florian Tramèr et al. “Stealing Machine Learning Models via Prediction APIs.” In: USENIX security symposium. Volume 16. 2016, pages 601–618. URL: https:// mahmoudnabil.github.io/Teaching/ECEN885F20/paper1.pdf (cited on pages 117, 118).
[165] Andreu Vall et al. “Feature-combination hybrid recommender systems for auto- mated music playlist continuation”. In: The Journal of Personalization Research (2018). DOI: 10.1007/s11257-018-9215-8 (cited on page 89).
[166] Arnaud Van Looveren et al. Alibi Detect: Algorithms for outlier, adversarial and drift detection. Version 0.11.4. July 7, 2023. URL: https://github.com/SeldonIO/alibi- detect (cited on page 109).
[167] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural informa- tion processing systems 30 (2017). URL: %7Bhttps :// proceedings . neurips . cc / paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf%7D (cited on page 76).
[168] Zhibo Wang et al. “Beyond inferring class representatives: User-level privacy leakage from federated learning”. In: IEEE INFOCOM 2019-IEEE conference on computer communications. IEEE. 2019, pages 2512–2520. DOI: 10 .1109 /INFOCOM . 2019.8737416 (cited on pages 117, 118).
[169] WeightsAndBiases. 17 Top MLOps Tools You Need to Know. Accessed on 12.12.2023. 2023. URL: https : / / www . datacamp . com / blog / top - mlops - tools (cited on page 106).
[170] Stallings William. Computer Security Principles and Practice, 5th Edition. Pearson, 2023. ISBN: 978-0138091583 (cited on page 113).
[171] WMR2021. Workflow Management Review: Airflow vs. Luigi. Accessed on 21.12.2023. URL: https://www.upsolver.com/blog/
[172] XGBoost. XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow. Accessed on 12.12.2023. URL: https://github.com/apache/mxnet (cited on page 113).
[173] Suresh Yaram. Machine Learning Model Development and Operations: Principles and Practice. Accessed on 07.12.2022. 2021. URL: https://www.kdnuggets.com/2021/ 10/machine-learning-model-development-operations-principles-practice.html (cited on page 108).
[174] Aston Zhang et al. Dive into Deep Learning. https://D2L.ai. Cambridge University Press, 2023. ISBN: 978-1009389433 (cited on pages 73, 80, 88, 91).
[175] Qian Zhang, Jie Lu, and Yaochu Jin. “Artificial intelligence in recommender sys- tems”. In: Complex and Intelligent Systems 7.1 (2021), pages 439–457. DOI: 10.1007/ s40747-020-00212-w (cited on page 19).
[176] Xuezhou Zhang, Xiaojin Zhu, and Laurent Lessard. “Online data poisoning at- tacks”. In: Learning for Dynamics and Control. PMLR. 2020, pages 201–210. URL: https://proceedings.mlr.press/v120/zhang20b.html (cited on page 118).
[177] Xuejun Zhao et al. “Exploiting explanations for model inversion attacks”. In: Pro- ceedings of the IEEE/CVF international conference on computer vision. 2021, pages 682– 692. DOI: 10.1109/ICCV48922.2021.00072 (cited on pages 117, 118).
[178] Xingchen Zhou et al. “Deep model poisoning attack on federated learning”. In: Future Internet 13.3 (2021), page 73. DOI: 10.3390/fi13030073 (cited on pages 117, 118).