On Mining Most Popular Packages
Volume 9, Issue 4, Page No 60-72, 2024
Author’s Name: Yangjun Chen*, Bobin Chen
View Affiliations
Department of Applied Computer Science, University of Winnpeg, Manitoba, R3B 2E9, Canada
a)whom correspondence should be addressed. E-mail: y.chen@uwinnipeg.ca
Adv. Sci. Technol. Eng. Syst. J. 9(4), 60-72 (2024); DOI: 10.25046/aj090407
Keywords: Data Mining, Most popular packages, NP-complete, Priority-first tree search, Tries, Trie-like graphs
Export Citations
In this paper, we will discuss two algorithms to solve the so-called package design problem, by which a set of queries (referred to as a query log) is represented by a collection of bit strings with each indicating the favourite activities or items of customers. For such a query log, we are required to design a package of activities (or items) so that as many customers as possible can be satisfied. It is a typical problem of data mining. For this problem, the existing algorithm requires at least O(n2m) time, where m is the number of activities (or items) and n is the number of queries. We try to improve this time complexity. The main idea of our first algorithm is to use a new tree search strategy to explore the query log. Its average time complexity is bounded by O(nm2 + m2m/2). By our second algorithm, all query bit strings are organized into a graph, called a trie-like graph. Searching such a graph bottom-up, we can find a most popular package in O(n2m3(log2 nm)log2 nm) time. Both of them work much better than any existing strategy for this problem.
Received: 22 April 2024, Revised: 11 June 2024, Accepted: 31 July 2024, Published Online: 07 August, 2024
- [1] R. Agrawal, T. Imieli´nski, A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 207–216, Association for Computing Machinery, 1993, doi:10.1145/170035.170072.
- B. Gavish, D. Horsky, K. Srikanth, “An Approach to the Optimal Positioning of a New Product,” Management Science, 29, 1277–1297, 1983, doi:10.1287/mnsc.29.11.1277.
- T. S. Gruca, B. R. Klemz, “Optimal new product positioning: A genetic algorithm approach,” European Journal of Operational Research, 146, 621–633, 2003, doi:https://doi.org/10.1016/S0377-2217(02)00349-1.
- J. Resig, A. Teredesai, “A framework for mining instant messaging services,” in In Proceedings of the 2004 SIAM DM Conference, 2004.
- J. Han, J. Pei, Y. Yin, “Mining frequent patterns without candidate generation,” SIGMOD Rec., 29, 1–12, 2000, doi:10.1145/335191.335372.
- M. Miah, G. Das, V. Hristidis, H. Mannila, “Standing Out in a Crowd: Selecting Attributes for Maximum Visibility,” in 2008 IEEE 24th International Conference on Data Engineering, 356–365, 2008, doi:10.1109/ICDE.2008.4497444.
- J. C.-W. Lin, Y. Li, P. Fournier-Viger, Y. Djenouri, L. S.-L. Wang, “Mining High-Utility Sequential Patterns from Big Datasets,” in 2019 IEEE International Conference on Big Data (Big Data), 2674–2680, 2019, doi:10.1109/BigData47090.2019.9005996.
- A. Tonon, F. Vandin, “gRosSo: mining statistically robust patterns from a sequence of datasets,” Knowledge and Information Systems, 64, 2329–2359, 2022, doi:10.1007/s10115-022-01689-2.
- Y. Chen, W. Shi, “On the Designing of Popular Packages,” in 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 937–944, 2018, doi:10.1109/Cybermatics 2018.2018.00180.
- M. Miah, “Most popular package design,” in 4th Conference on Information Systems Applied Research, 2011.
- R. Kohli, R. Krishnamurti, P. Mirchandani, “The Minimum Satisfiability Problem,” SIAM Journal on Discrete Mathematics, 7, 275–283, 1994, doi:10.1137/S0895480191220836.
- S. A. Cook, The Complexity of Theorem-Proving Procedures, 143–152, Association for Computing Machinery, 1st edition, 2023.
- Y. Chen, “Signature files and signature trees,” Information Processing Letters, 82, 213–221, 2002, doi:https://doi.org/10.1016/S0020-0190(01)00266-6.
- Y. Chen, “On the signature trees and balanced signature trees,” in 21st International Conference on Data Engineering (ICDE’05), 742–753, 2005, doi:10.1109/ICDE.2005.99.
- Y. Chen, Y. Chen, “On the Signature Tree Construction and Analysis,” IEEE Transactions on Knowledge and Data Engineering, 18, 1207–1224, 2006, doi:10.1109/TKDE.2006.146.
- F. Grandi, P. Tiberio, P. Zezula, “Frame-sliced partitioned parallel signature files,” in Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 286–287, Association for Computing Machinery, 1992, doi:10.1145/133160.133211.
- D. L. Lee, Y. M. Kim, G. Patel, “Efficient signature file methods for text retrieval,” IEEE Transactions on Knowledge and Data Engineering, 7, 423–435, 1995, doi:10.1109/69.390248.
- D. L. Lee, C.-W. Leng, “Partitioned signature files: design issues and performance evaluation,” ACM Trans. Inf. Syst., 7, 158–180, 1989,
doi:10.1145/65935.65937. - Z. Lin, C. Faloutsos, “Frame-sliced signature files,” IEEE Transactions on Knowledge and Data Engineering, 4, 281–289, 1992, doi:10.1109/69.142018.
- E. Tousidou, P. Bozanis, Y. Manolopoulos, “Signature-based structures for objects with set-valued attributes,” Information Systems, 27, 93–121, 2002, doi:https://doi.org/10.1016/S0306-4379(01)00047-3.
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to algorithms, MIT press, 2022.
- P. Flajolet, C. Puech, “Partial match retrieval of multidimensional data,” J. ACM, 33, 371–407, 1986, doi:10.1145/5383.5453.
- L. Debnath, D. Bhatta, Integral transforms and their applications, Chapman and Hall/CRC, 2016.
- E. K. Donald, “The art of computer programming,” Sorting and searching, 3, 4, 1999.
No. of Downloads Per Month